Cheatsheet: Reading the results: the confusion matrix, precision, recall, and ROC
Why accuracy lies on imbalanced data
Section titled “Why accuracy lies on imbalanced data”| Setup | Accuracy of “always predict majority” |
|---|---|
| 99% negatives, 1% positives | 99% (catches zero positives) |
| Lesson: never rely on accuracy alone when classes are imbalanced |
The confusion matrix
Section titled “The confusion matrix”| Actual: positive | Actual: negative | |
|---|---|---|
| Predicted: positive | TP | FP |
| Predicted: negative | FN | TN |
Metrics from the matrix
Section titled “Metrics from the matrix”| Metric | Formula | Question it answers |
|---|---|---|
| Accuracy | (TP+TN) / total | what fraction overall is correct? |
| Precision | TP / (TP+FP) | of my positive predictions, how many are real? |
| Recall (sensitivity) | TP / (TP+FN) | of all real positives, how many did I catch? |
| Specificity | TN / (TN+FP) | of all real negatives, how many did I correctly let through? |
| F1 | 2 * P * R / (P+R) | harmonic mean of precision and recall |
| False positive rate | FP / (FP+TN) | x axis of the ROC curve |
Worked example (1000 transactions, 10 fraud)
Section titled “Worked example (1000 transactions, 10 fraud)”| Actual: Fraud | Actual: Not Fraud | |
|---|---|---|
| Predicted: Fraud | TP = 8 | FP = 50 |
| Predicted: Not Fraud | FN = 2 | TN = 940 |
| Metric | Value |
|---|---|
| Accuracy | 94.8% |
| Precision | ~13.8% |
| Recall | 80.0% |
| Specificity | ~94.9% |
| F1 | ~23.5% |
The 94.8% accuracy headline hides the 14% precision (most flags are wrong).
Picking the metric
Section titled “Picking the metric”| Situation | Optimize | Why |
|---|---|---|
| Medical screen, fraud, safety alerts | recall | missing a positive is much worse |
| Spam filter, top search results | precision | a false alarm is much worse |
| Both errors costly | F1 (or domain-weighted) | balance the two |
| Imbalanced data | NEVER accuracy alone | majority class dominates |
Threshold dial
Section titled “Threshold dial”| Lower the threshold | Raise the threshold |
|---|---|
| recall up, precision down | precision up, recall down |
| more flags, more false alarms | fewer flags, more misses |
ROC curve and AUC
Section titled “ROC curve and AUC”| Item | Detail |
|---|---|
| X axis | false positive rate (FP / (FP+TN)) |
| Y axis | true positive rate (recall) |
| Each point | a different threshold |
| Top-left corner | perfect classifier |
| Diagonal | random guessing |
| AUC | area under ROC; 0.5 random, 1.0 perfect |
| Caveat | AUC can look optimistic on very imbalanced data; use precision-recall curve as a check |