Skip to content

References: Reading the results: the confusion matrix, precision, recall, and ROC

Source material (conceptual spine):
• StatQuest with Josh Starmer: "Machine Learning Fundamentals: The Confusion Matrix"
Creator: Josh Starmer
YouTube: https://www.youtube.com/watch?v=Kdsp6soqA7o
• StatQuest with Josh Starmer: "Machine Learning Fundamentals: Sensitivity and Specificity"
YouTube: https://www.youtube.com/watch?v=vP06aMoz4v8
• StatQuest with Josh Starmer: "ROC and AUC"
YouTube: https://www.youtube.com/watch?v=4jRBRDbJemM
Channel / site: https://statquest.org/
License: as published on StatQuest's public YouTube channel (link-out only)
Source material (hands-on companion):
• Microsoft: "ML For Beginners" (evaluation in the Classification module)
Repository: https://github.com/microsoft/ML-For-Beginners
License: MIT
Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos and curriculum remain
with their creators.
  • StatQuest’s “Confusion Matrix” anchors the four-cell foundation; “Sensitivity and Specificity” anchors recall and specificity, with StatQuest’s healthcare framing; “ROC and AUC” anchors the threshold-independent view.
  • Microsoft’s ML-For-Beginners is the hands-on companion for computing these metrics in scikit-learn, including classification_report, confusion_matrix, and roc_auc_score.

The fraud worked example, the explicit “accuracy lies on imbalanced data” frame, the pick-the-metric-by-cost decision framework, and the closing-the-track synthesis are Clawdemy’s own.

  • StatQuest with Josh Starmer. The three videos above plus StatQuest’s broader evaluation series; recommended in order (confusion matrix -> sensitivity/specificity -> ROC/AUC).
  • Microsoft ML-For-Beginners. Hands-on lessons computing all the metrics in this lesson on real datasets.
  • Precision-recall curve. The companion to ROC, often more informative on heavily imbalanced data, where ROC can look optimistically good.
  • Multi-class metrics. Confusion matrices generalize to k x k; precision and recall extend per-class with macro and micro averaging schemes. The two-class case in this lesson is the foundation.
  • Cost-sensitive learning and threshold tuning. When you know the actual costs of false positives versus false negatives, you can set the threshold to minimize expected cost rather than picking by precision or recall in isolation.

None selected for this lesson. Classification metrics are well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.