References: Testing a claim: hypothesis testing and p-values

Source material

Source curriculum (structural mirror, cited as further study):
• Khan Academy, "Significance tests (hypothesis testing)" (Statistics & Probability)
  Author: Sal Khan and the Khan Academy team
  Unit page: https://www.khanacademy.org/math/statistics-probability/significance-tests-one-sample
  License: CC BY-NC-SA 4.0
Clawdemy's lessons are original prose that follows the pedagogical arc of this
unit. We do not embed, reproduce, or transcribe Khan's text or videos; we link
out to the relevant unit as recommended further study. The non-commercial
clause aligns with Clawdemy's free, zero-revenue posture. All rights to the
original materials remain with their authors.

Source-scope note: this lesson mirrors Khan's significance-testing material
(null and alternative hypotheses, the p-value, the significance level) and
restates it in Clawdemy's voice with original examples (a coin, an A/B model
comparison). The careful treatment of the p-value misreadings, the link back to
the Bayes lessons' flipped conditional, and the multiple-testing caution are
emphasized as Clawdemy framing because they are where the most damage happens in
practice. The AI framing (A/B testing, benchmark significance, the replication
trap) is Clawdemy framing. Exact per-unit URLs are verified at promotion.

Read this next

Khan Academy: Significance tests (hypothesis testing) by Sal Khan and the Khan Academy team. The full unit this lesson mirrors, with videos and practice on setting up hypotheses, computing p-values, and making the reject / fail-to-reject decision, free and CC-licensed. The place to drill the mechanics until the interpretation sticks.

Going deeper

A short, durable list. Both are free.

Khan Academy, “Confidence intervals” (within the course above). The previous lesson’s source; confidence intervals and hypothesis tests are two views of the same idea, and seeing both clarifies each.
Khan Academy, “Sampling distributions” (within the course above). Revisit the standard error and the central limit theorem; they are the machinery that turns an observed difference into a p-value.

Adjacent topics

Where this sits inside this track.

How sure are we? confidence intervals. The previous lesson. Overlapping intervals hinted a difference might be noise; the hypothesis test makes that call formally.
When one event tells you about another: conditional probability. Phase 2. The central p-value misreading is exactly the flipped conditional from that lesson: P(data | null) is not P(null | data).
Statistics in machine learning. The capstone, and the next lesson. It ties hypothesis testing, confidence intervals, and the rest of the track to how AI systems are actually evaluated and compared.