Skip to content

Cheatsheet: Reading the results: the confusion matrix, precision, recall, and ROC

SetupAccuracy of “always predict majority”
99% negatives, 1% positives99% (catches zero positives)
Lesson: never rely on accuracy alone when classes are imbalanced
Actual: positiveActual: negative
Predicted: positiveTPFP
Predicted: negativeFNTN
MetricFormulaQuestion it answers
Accuracy(TP+TN) / totalwhat fraction overall is correct?
PrecisionTP / (TP+FP)of my positive predictions, how many are real?
Recall (sensitivity)TP / (TP+FN)of all real positives, how many did I catch?
SpecificityTN / (TN+FP)of all real negatives, how many did I correctly let through?
F12 * P * R / (P+R)harmonic mean of precision and recall
False positive rateFP / (FP+TN)x axis of the ROC curve

Worked example (1000 transactions, 10 fraud)

Section titled “Worked example (1000 transactions, 10 fraud)”
Actual: FraudActual: Not Fraud
Predicted: FraudTP = 8FP = 50
Predicted: Not FraudFN = 2TN = 940
MetricValue
Accuracy94.8%
Precision~13.8%
Recall80.0%
Specificity~94.9%
F1~23.5%

The 94.8% accuracy headline hides the 14% precision (most flags are wrong).

SituationOptimizeWhy
Medical screen, fraud, safety alertsrecallmissing a positive is much worse
Spam filter, top search resultsprecisiona false alarm is much worse
Both errors costlyF1 (or domain-weighted)balance the two
Imbalanced dataNEVER accuracy alonemajority class dominates
Lower the thresholdRaise the threshold
recall up, precision downprecision up, recall down
more flags, more false alarmsfewer flags, more misses
ItemDetail
X axisfalse positive rate (FP / (FP+TN))
Y axistrue positive rate (recall)
Each pointa different threshold
Top-left cornerperfect classifier
Diagonalrandom guessing
AUCarea under ROC; 0.5 random, 1.0 perfect
CaveatAUC can look optimistic on very imbalanced data; use precision-recall curve as a check