Skip to content

Practice: How sure are we? confidence intervals

Two skills: building an interval (estimate plus or minus a margin of error) and interpreting it correctly, which is where almost everyone slips. The interpretation drill is the one that matters most. Keep a scratchpad.

Six short questions. Answer each in your head before opening the collapsible.

1. Why report a confidence interval instead of just a point estimate?

Show answer

Because a single number hides its own uncertainty. The estimate would come out differently on another sample, so reporting only “90%” pretends to a precision it does not have. An interval shows the range of plausible values for the truth, making the uncertainty explicit.

2. How do you build a 95% confidence interval, roughly?

Show answer

Estimate plus or minus a margin of error, where the margin is a multiplier times the standard error. For 95% confidence the multiplier is about 2 (precisely 1.96, from the normal). So a 95% interval is about estimate +/- 2 standard errors.

3. What two things change the width of the interval, and in which direction?

Show answer

Sample size: more data shrinks the standard error (sigma over root n) and narrows the interval. Confidence level: a higher level (99% vs 95%) uses a bigger multiplier and widens the interval. A tight, high-confidence interval comes from more data, not from the confidence dial.

4. What does a 95% confidence interval actually mean?

Show answer

That the procedure is reliable: if you repeated the sampling and interval-building many times, about 95% of the intervals would contain the true parameter. It is a statement about the method’s long-run hit rate, not about this one interval.

5. What is the common WRONG interpretation, and why is it wrong?

Show answer

The wrong reading is “there is a 95% probability the true value lies in this particular interval.” It is wrong because the true parameter is a fixed number, not random: it is either in this interval or not. What is random is the interval (it depends on the sample drawn), so the 95% describes the procedure, not this specific range.

6. Two models’ confidence intervals overlap heavily. What can you conclude?

Show answer

That you cannot conclude one is better from this data. The point estimates may differ, but heavy overlap means the difference is within the noise; the evidence does not support a real gap. Deciding whether a difference is real is the job of a hypothesis test (the next lesson).

A landing page converts 40% of visitors in a test, and the standard error of that conversion rate is 3 percentage points.

1. Build the 95% confidence interval (use multiplier 2).
2. Build the 99% confidence interval (use multiplier about 2.6).
3. With a much larger test, the standard error drops to 1.5 points. What is
the new 95% interval?
Show answer
1. 95% CI = 40% +/- 2 x 3% = 40% +/- 6% = [34%, 46%]
2. 99% CI = 40% +/- 2.6 x 3% = 40% +/- 7.8% = [32.2%, 47.8%] (wider: more confidence)
3. 95% CI = 40% +/- 2 x 1.5% = 40% +/- 3% = [37%, 43%] (narrower: more data)

The two dials in action: raising confidence (1 to 2) widened the interval; adding data (1 to 3) narrowed it. A tight, high-confidence interval would need still more data.

Try it yourself: true or false (the interpretation drill)

Section titled “Try it yourself: true or false (the interpretation drill)”

A model’s accuracy is reported as 90% with a 95% confidence interval of [86%, 94%]. Mark each statement true or false.

A. "There is a 95% probability the true accuracy is between 86% and 94%."
B. "If we re-ran this measurement many times, about 95% of the intervals we
built would contain the true accuracy."
C. "95% of the test examples had scores between 86% and 94%."
D. A second model scores 91% with a 95% interval of [87%, 95%]. "We can
conclude the second model is more accurate."
Show answer
  • A: false. The classic misreading. The true accuracy is fixed; it is either in [86%, 94%] or not. The 95% describes the procedure, not the probability for this specific interval.
  • B: true. This is the correct, procedure-based interpretation.
  • C: false. The interval is a range of plausible values for the parameter (the accuracy), not a range that contains 95% of individual data points.
  • D: false. The two intervals ([86%, 94%] and [87%, 95%]) overlap heavily, so the 1-point difference in point estimates is within the noise. You cannot conclude the second model is better from this data.

Eight cards. Click any card to reveal the answer. Use the Print flashcards button to lay out the full set as one card per page for offline review.

Q. Why report a confidence interval instead of a point estimate?
A.

A single number hides its uncertainty; it would differ on another sample. An interval shows the range of plausible values for the truth, making the uncertainty explicit.

Q. How do you build a 95% confidence interval?
A.

Estimate +/- margin of error, where margin = multiplier x standard error. For 95% the multiplier is about 2 (1.96), so roughly estimate +/- 2 standard errors.

Q. What two things set the width of a confidence interval?
A.

Sample size (more data -> smaller standard error -> narrower) and confidence level (higher level -> bigger multiplier -> wider). Tight + high-confidence comes from more data.

Q. What is the correct interpretation of a 95% confidence interval?
A.

A statement about the procedure: if you repeated the sampling and interval-building many times, about 95% of the intervals would contain the true parameter.

Q. What is the common WRONG interpretation of a confidence interval?
A.

‘There is a 95% probability the truth is in THIS interval.’ Wrong because the parameter is fixed, not random; the interval is what varies. The 95% is the procedure’s long-run hit rate.

Q. Does a 95% confidence interval contain 95% of the data?
A.

No. It is a range of plausible values for the parameter (like the mean or accuracy), not a range holding 95% of individual data points.

Q. Two results have heavily overlapping confidence intervals. What follows?
A.

You cannot conclude one is better from this data; the difference is within the noise. Establishing a real difference requires a hypothesis test.

Q. Why report AI metrics with confidence intervals?
A.

‘90%’ implies false precision; ‘90%, 95% CI [86%, 94%]’ tells the reader how much to trust it. A small test set yields a wide interval, which honestly signals how little is actually known.