Statistics sampling

From wikinotes

On methods of sampling, and pros/cons.

Terminology

sampling-frame:  Part of population you'd like to draw a sample from

undercoverage:   Omitting population members from a sampling-frame when they should have been included.

Sampling Methods

Simple Random Sampling (SRS)

Choosing N items at random (ex. choosing from a hat)

Issues:

  • cannot sample if you don't know who is present
  • list must be representative (ex. sample of cafeteria clients on a saturday will deviate from norms on a weekday)

Stratified Sampling

In stratified sampling, you partition your data into subcategories.
(Ex. First year students, second year students, third year students.)
This helps account for deviances/minorities within your data sample.

Issues:

  • possible to oversample one group (ex. minority within data gets equal weight)

Systematic Sampling

Pick every Nth individual for your sample.

Cluster Sampling

Define your population, then sample clusters of that population where:

  • every measured characteristic is represented within each cluster
  • each cluster has similar distribution of characteristics
  • where all clusters taken together covers the entire population

Choose random clusters to use as your sample.

# population    clusters     randomly-selected
#                               clusters
a b c d e      abfg   cdhi
f g h i j  ->               ->  abfg  ijno
k l m n o      klmn   ijno

When used:

  • When geography is important to sampling

Issues:

  • People in a cluster may be similar in a way that makes problem hard to study
    (ex. other contaminants in pollution study)

Convenience Sampling

Using results or data conveniently obtained.
Ex. asking friends what they think VS customers.

Issues:

  • unreliable, uneven distribution

Multi-Stage Sampling

Combining Multiple Sampling methods for final output, in a pipeline.
Ex. You may start with a clustered sample, then filter the results using simple-random, then finally use stratified-sampling on those results.
Typically used if you have a giant sample, ex. often used by governments.