Statistics distributions

From wikinotes

Tutorials

statistics howto (thorough until uni) https://www.statisticshowto.com/probability-and-statistics/
StatQuest Youtube Channel (percentiles, normalization, etc)

Histograms

A Bar graph to reveal the distribution of the data.
Info is clumped in equal size chunks of your data.


           |
           |
Frequency  |        |
  of       |        |
 ${X}      |  |     |
           |  |     |     |
           |  |     |     |
           +----------------------
             1-8  9-16  17-24

There are different measurements of frequency

# (ex. 2x yr1 students, 3x yr2 students)
cumulative:  each group includes the previous group's results. (yr1==2, yr2==5)
relative:    each group is mutually exclusive (yr1==2, yr2==3)

Distribution

A distribution describes the shape of a histogram.
There are various named classifications of distributions, that describe a shape in your chart: Some examples:

Normal Distribution

      |
      | |
    | | |
  | | | | |
  +-+-+-+-+

Skewed Right

  |
| |
| | |
| | | |
+-+-+-+-+

Bimodal

  |
  |   |
| | | |
| | | |
+-+-+-+-+

Outliers are data that is largely unrelated to the majority.

       |
       |
       | |     |
|    | | |     |
+----+-+-+-----+

^              ^
+--------------+
   (outliers)

Quantiles

If you arrange your data as a distribution, sorted by frequency,
Quantiles can be used to extract information based on the frequency, and eliminate anomalies from the data.

           |
           |
Frequency  |             |
  of       |           | |
 ${X}      |     | | | | |
           |   | | | | | |
           | | | | | | | |
           +-+-+-+-+-+-+-+
             1 4 3 5 7 6 2
                   ^
                 median

The median is an example of a quantile.
50% of values were more frequent, and 50% were less frequent.
We can say the median (50th percentile) value was 5.

Similarly, we can say the value of the 85% percentile's value is 6.