Practice: How sure are we? confidence intervals
Two skills: building an interval (estimate plus or minus a margin of error) and interpreting it correctly, which is where almost everyone slips. The interpretation drill is the one that matters most. Keep a scratchpad.
Self-check
Section titled “Self-check”Six short questions. Answer each in your head before opening the collapsible.
1. Why report a confidence interval instead of just a point estimate?
Show answer
Because a single number hides its own uncertainty. The estimate would come out differently on another sample, so reporting only “90%” pretends to a precision it does not have. An interval shows the range of plausible values for the truth, making the uncertainty explicit.
2. How do you build a 95% confidence interval, roughly?
Show answer
Estimate plus or minus a margin of error, where the margin is a multiplier times the standard error. For 95% confidence the multiplier is about 2 (precisely 1.96, from the normal). So a 95% interval is about estimate +/- 2 standard errors.
3. What two things change the width of the interval, and in which direction?
Show answer
Sample size: more data shrinks the standard error (sigma over root n) and narrows the interval. Confidence level: a higher level (99% vs 95%) uses a bigger multiplier and widens the interval. A tight, high-confidence interval comes from more data, not from the confidence dial.
4. What does a 95% confidence interval actually mean?
Show answer
That the procedure is reliable: if you repeated the sampling and interval-building many times, about 95% of the intervals would contain the true parameter. It is a statement about the method’s long-run hit rate, not about this one interval.
5. What is the common WRONG interpretation, and why is it wrong?
Show answer
The wrong reading is “there is a 95% probability the true value lies in this particular interval.” It is wrong because the true parameter is a fixed number, not random: it is either in this interval or not. What is random is the interval (it depends on the sample drawn), so the 95% describes the procedure, not this specific range.
6. Two models’ confidence intervals overlap heavily. What can you conclude?
Show answer
That you cannot conclude one is better from this data. The point estimates may differ, but heavy overlap means the difference is within the noise; the evidence does not support a real gap. Deciding whether a difference is real is the job of a hypothesis test (the next lesson).
Try it yourself: build the interval
Section titled “Try it yourself: build the interval”A landing page converts 40% of visitors in a test, and the standard error of that conversion rate is 3 percentage points.
1. Build the 95% confidence interval (use multiplier 2).2. Build the 99% confidence interval (use multiplier about 2.6).3. With a much larger test, the standard error drops to 1.5 points. What is the new 95% interval?Show answer
1. 95% CI = 40% +/- 2 x 3% = 40% +/- 6% = [34%, 46%]2. 99% CI = 40% +/- 2.6 x 3% = 40% +/- 7.8% = [32.2%, 47.8%] (wider: more confidence)3. 95% CI = 40% +/- 2 x 1.5% = 40% +/- 3% = [37%, 43%] (narrower: more data)The two dials in action: raising confidence (1 to 2) widened the interval; adding data (1 to 3) narrowed it. A tight, high-confidence interval would need still more data.
Try it yourself: true or false (the interpretation drill)
Section titled “Try it yourself: true or false (the interpretation drill)”A model’s accuracy is reported as 90% with a 95% confidence interval of [86%, 94%]. Mark each statement true or false.
A. "There is a 95% probability the true accuracy is between 86% and 94%."B. "If we re-ran this measurement many times, about 95% of the intervals we built would contain the true accuracy."C. "95% of the test examples had scores between 86% and 94%."D. A second model scores 91% with a 95% interval of [87%, 95%]. "We can conclude the second model is more accurate."Show answer
- A: false. The classic misreading. The true accuracy is fixed; it is either in [86%, 94%] or not. The 95% describes the procedure, not the probability for this specific interval.
- B: true. This is the correct, procedure-based interpretation.
- C: false. The interval is a range of plausible values for the parameter (the accuracy), not a range that contains 95% of individual data points.
- D: false. The two intervals ([86%, 94%] and [87%, 95%]) overlap heavily, so the 1-point difference in point estimates is within the noise. You cannot conclude the second model is better from this data.
Flashcards
Section titled “Flashcards”Eight cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page for offline review.
Q. Why report a confidence interval instead of a point estimate?
A single number hides its uncertainty; it would differ on another sample. An interval shows the range of plausible values for the truth, making the uncertainty explicit.
Q. How do you build a 95% confidence interval?
Estimate +/- margin of error, where margin = multiplier x standard error. For 95% the multiplier is about 2 (1.96), so roughly estimate +/- 2 standard errors.
Q. What two things set the width of a confidence interval?
Sample size (more data -> smaller standard error -> narrower) and confidence level (higher level -> bigger multiplier -> wider). Tight + high-confidence comes from more data.
Q. What is the correct interpretation of a 95% confidence interval?
A statement about the procedure: if you repeated the sampling and interval-building many times, about 95% of the intervals would contain the true parameter.
Q. What is the common WRONG interpretation of a confidence interval?
‘There is a 95% probability the truth is in THIS interval.’ Wrong because the parameter is fixed, not random; the interval is what varies. The 95% is the procedure’s long-run hit rate.
Q. Does a 95% confidence interval contain 95% of the data?
No. It is a range of plausible values for the parameter (like the mean or accuracy), not a range holding 95% of individual data points.
Q. Two results have heavily overlapping confidence intervals. What follows?
You cannot conclude one is better from this data; the difference is within the noise. Establishing a real difference requires a hypothesis test.
Q. Why report AI metrics with confidence intervals?
‘90%’ implies false precision; ‘90%, 95% CI [86%, 94%]’ tells the reader how much to trust it. A small test set yields a wide interval, which honestly signals how little is actually known.