Summary: Reading the results: the confusion matrix, precision, recall, and ROC

Accuracy is a poor default on imbalanced data: a 99% accurate model that catches zero of the 1% positive class is still 99% accurate. The honest metrics come from the confusion matrix, with precision, recall, and ROC/AUC telling you what your classifier is actually doing. This summary is the scan version of the full lesson, which closes the track.

Core ideas

Accuracy lies on imbalanced data. The majority class dominates the count; you can score very high without catching the rare class that matters. On any imbalanced problem, look past accuracy.
The confusion matrix is the full picture. Four counts (TP, TN, FP, FN); every other classification metric is some combination of these.
Precision = TP / (TP + FP): of everything I flagged as positive, what fraction is real? High precision = few false alarms.
Recall (sensitivity) = TP / (TP + FN): of all actual positives, what fraction did I catch? High recall = few misses.
Specificity = TN / (TN + FP): of all actual negatives, what fraction did I correctly let through?
F1 = 2 * P * R / (P + R): harmonic mean of precision and recall; drops sharply when either is bad. A single number for balanced reporting.
Precision and recall trade off through the decision threshold. Lower the threshold to raise recall (catch more) at the cost of precision (more false alarms). Raise it for the opposite.
Choose the metric for the cost of each error type: high recall when missing a positive is expensive (medical screen, fraud); high precision when false alarms are expensive (spam filter, search results); F1 when both matter; never accuracy alone on imbalanced data.
ROC and AUC. The ROC curve plots true positive rate vs false positive rate across every threshold; AUC summarizes it as a single number (0.5 random, 1.0 perfect). AUC can look optimistic on very imbalanced data; the precision-recall curve is often the more honest companion.

What changes for you

The right reaction to “the model is 99% accurate” is now baked in: ask on what data, what is the class balance, what are precision and recall, and at what threshold. That habit alone protects you from a lot of misleading headlines, especially in domains where the rare class is the one that matters. It also gives the L4 threshold a concrete second meaning: it is the precision-recall dial. With this lesson the track closes. Across fifteen lessons we walked from “what is machine learning” through every workhorse model (regression, classification, ensembles, clustering, dimensionality reduction) and through the evaluation framework that decides whether any of them is doing the job. The classical-machine-learning toolbox is now yours in one piece.