Skip to content

Cheatsheet: Summarizing data: center and spread

Every summary answers two questions: where is the center of the data, and how spread out is it? Center without spread is half a description, and the wrong center (mean on skewed data) is a lie.

CenterHowUses every value?Outlier-sensitive?Best for
MeanAdd all, divide by countYesYes (dragged by outliers)Symmetric, outlier-free data
MedianMiddle value when sortedNoNo (robust)Skewed data, data with outliers
ModeMost frequent valueNoNoCategorical data

When the mean and median disagree, the data is skewed; report the median as the typical value.

MeasureHowNote
RangeMax minus minSimple but fragile; only the two extremes
VarianceAverage of squared distances from the meanIn squared units; hard to read directly
Standard deviationSquare root of the varianceTypical distance from the mean, in original units
Data: 2, 4, 4, 4, 5, 5, 7, 9
Mean = 40 / 8 = 5 Median = (4+5)/2 = 4.5 Mode = 4
Squared distances from 5: 9,1,1,1,0,0,4,16 (sum = 32)
Variance = 32 / 8 = 4 Standard deviation = sqrt(4) = 2

Standardizing a feature (machine learning)

Section titled “Standardizing a feature (machine learning)”
standardized value = (value - mean) / standard deviation

Recenters the feature at 0, rescales its spread to be comparable across features, and keeps a large-unit feature (income) from numerically dominating a small-unit one (age). The standardized value is the z-score (a later lesson).

  • Reporting the mean for skewed data (use the median).
  • Reporting a center with no spread.
  • Confusing variance (squared units) with standard deviation (original units).
  • Using the range as your spread (one outlier blows it up).
  • Forgetting the mode is the only center that works for categories.
  • Mean: the arithmetic average; uses every value; outlier-sensitive.
  • Median: the middle value when sorted; robust to outliers.
  • Mode: the most frequent value; the center for categorical data.
  • Variance: average squared distance from the mean.
  • Standard deviation: square root of variance; typical distance from the mean, in original units.
  • Skew: a few extreme values stretching one tail, pushing the mean away from the median.