Practice: Summarizing data: center and spread
The aim here is fluency with the two questions every summary answers (where is the center, how spread out is it) and the judgment to pick the right summary. Bring a scratchpad; the last exercise asks you to compute everything by hand.
Self-check
Section titled “Self-check”Six short questions. Answer each in your head before opening the collapsible.
1. When do the mean and the median disagree, and which is usually the more honest “typical” value?
Show answer
They disagree when the data is skewed, when a few extreme values stretch one tail. The mean uses every value and gets dragged toward the extremes; the median only cares how many values fall on each side, so it stays put. For skewed data (incomes, prices, wait times), the median is usually the more honest summary of the typical case.
2. Why is the median called “robust” and the mean is not?
Show answer
Because the median barely moves when extreme values change. You can make one value enormous and the median stays where it is, since it only depends on the order and count of values, not their size. The mean, which adds every value, shifts toward any outlier. Robust means resistant to outliers.
3. Two datasets both have a mean of 50. Do you know they look alike?
Show answer
No. They have the same center but could have wildly different spread: one might be 49, 50, 51 and the other 0, 50, 100. Center without spread is half a description. You need a spread measure (range, variance, or standard deviation) to know the shape.
4. What does the standard deviation measure, in plain words?
Show answer
Roughly the typical distance of a value from the mean. You measure how far each value is from the mean, square those distances, average them (that average is the variance), then take the square root to get back to the original units. A small standard deviation means the data huddles near the mean; a large one means it spreads out.
5. Why do we square the distances when computing variance instead of just averaging them?
Show answer
Because distances above and below the mean would cancel out: the plain average of the signed distances from the mean is always zero. Squaring makes every distance positive so they add up, and it makes larger distances count for proportionally more. Taking the square root at the end (the standard deviation) returns the result to the original units.
6. In machine learning, what does it mean to “standardize” a feature, and which summaries does it use?
Show answer
Subtract the feature’s mean and divide by its standard deviation, so the feature is recentered at 0 and rescaled to a comparable spread. It uses exactly the two summaries from this lesson, and it keeps a feature measured in huge units (income) from numerically overwhelming one in small units (age).
Try it yourself: which summary, and why?
Section titled “Try it yourself: which summary, and why?”For each situation, decide whether the mean or the median better describes the “typical” value, and say why in one line.
A. The salaries at a 200-person company where the three founders each earn far more than everyone else.B. The heights of players on a basketball team (all roughly similar, no extreme outliers).C. House sale prices in a city where most homes are modest but a handful of mansions sell for 50x the rest.D. The scores on a quiz where results are bunched symmetrically around the middle.Show answer
- A: median. A few very high founder salaries drag the mean upward, so the median better describes what a typical employee earns.
- B: either, mean is fine. Roughly symmetric with no extreme outliers, so the mean and median nearly agree.
- C: median. Classic right-skew; the mansions stretch the mean far above what a typical home costs.
- D: either, mean is fine. Symmetric, bunched data; mean and median land in nearly the same place.
The pattern: extreme outliers or a long tail push you to the median; symmetric, outlier-free data lets you use the mean.
Try it yourself: compute everything by hand
Section titled “Try it yourself: compute everything by hand”Here is a small dataset, the number of support tickets closed per day over five days:
4, 6, 6, 7, 12Compute, in order: the mean, the median, the mode, the variance, and the standard deviation. Then say whether the data is skewed and which center you would report. Work it out before checking.
Show answer
Sorted: 4, 6, 6, 7, 12 (already in order)
Mean = (4 + 6 + 6 + 7 + 12) / 5 = 35 / 5 = 7Median = the middle (3rd) value = 6Mode = the most common value = 6
Distances from the mean (7): -3, -1, -1, 0, 5Squared distances: 9, 1, 1, 0, 25 (sum = 36)Variance = 36 / 5 = 7.2Standard deviation = square root of 7.2 = about 2.7Is it skewed? Yes, slightly right-skewed: the mean (7) sits above the median (6) because the single high value, 12, pulls the mean up while the median stays at the center of the count. Which center would you report? For a “typical day,” the median of 6 is the more honest summary, because the one busy day inflates the mean. And the standard deviation of about 2.7 tells you a typical day lands roughly 2 to 3 tickets away from the center, which the 12-ticket day clearly exceeds.
Flashcards
Section titled “Flashcards”Eight cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page for offline review.
Q. Mean vs median: which uses every value, and which resists outliers?
The mean uses every value (so outliers drag it). The median is the middle value and resists outliers (it depends only on order and count). When they disagree, the data is skewed.
Q. When should you report the median instead of the mean?
When the data is skewed or has extreme outliers (incomes, prices, wait times). The median better describes the typical case; the mean gets pulled toward the extremes.
Q. What is the mode, and when is it the right center?
The most frequently occurring value. It is the natural center for categorical data (most common product, most frequent error code), where mean and median do not apply.
Q. Why is center alone not enough to describe a dataset?
Because two datasets with the same mean can have completely different spread (49,50,51 vs 0,50,100). You need a spread measure (range, variance, standard deviation) to know the shape.
Q. What is variance, and why are the distances squared?
Variance is the average squared distance from the mean. Distances are squared so above-and-below values do not cancel to zero and so larger distances count more.
Q. What is the standard deviation, in plain words?
The square root of the variance: roughly the typical distance of a value from the mean, reported in the original units (not squared units like variance).
Q. What does standardizing a feature do in machine learning?
Subtract the feature’s mean and divide by its standard deviation, recentering it at 0 and rescaling its spread. It stops a large-unit feature (income) from numerically dominating a small-unit one (age).
Q. Why is the range a fragile measure of spread?
It is largest minus smallest, so it depends only on the two most extreme points. A single outlier can blow it up while the bulk of the data is tightly packed. Standard deviation uses the whole set.