References: Train, test, and cross-validation

Source material

Source material (conceptual spine):
• StatQuest with Josh Starmer: "Machine Learning Fundamentals: Cross Validation"
  Creator: Josh Starmer
  YouTube: https://www.youtube.com/watch?v=fSytzGwwBVw
  Channel / site: https://statquest.org/
  License: as published on StatQuest's public YouTube channel (link-out only)

Source material (hands-on companion):
• Microsoft: "ML For Beginners" (evaluation appears across modules)
  Repository: https://github.com/microsoft/ML-For-Beginners
  License: MIT

Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos and curriculum remain
with their creators.

What this lesson draws from each source

StatQuest’s “Cross Validation” anchors the k-fold procedure, why a single split is unstable, and the role of averaging across folds. The validation-set-vs-test-set distinction, the worked 5-fold computation, and the explicit data-leakage taxonomy (four common traps) are built out here as the lesson’s practical core.
Microsoft’s ML-For-Beginners is the hands-on companion for running cross-validation in scikit-learn, including stratified splits and pipelines that fit preprocessing inside cross-validation correctly.

The “what question to ask of any reported accuracy” framing and the explicit setup of metrics for the next lesson are Clawdemy’s own.

Going deeper

StatQuest with Josh Starmer. The cross-validation video plus material on training, testing, and the related fundamentals.
Microsoft ML-For-Beginners. Hands-on lessons using scikit-learn’s train_test_split, KFold, and Pipeline (which is the standard tool for fitting preprocessing inside CV correctly).

Adjacent topics

Pipelines (in scikit-learn or similar). The standard way to wrap preprocessing with the model so cross-validation refits each step on each training fold, automatically preventing the preprocessing-leak trap named here.
Nested cross-validation. A more careful procedure when you both tune hyperparameters and want an unbiased generalization estimate; an outer CV evaluates while an inner CV tunes. Outside this track’s scope, worth knowing exists.
Confusion matrix, precision, recall, ROC (the next lesson). The right metrics to evaluate with, once you have the right way to evaluate set up.

Community discussion

None selected for this lesson. Cross-validation is well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.