Skip to content

Cheatsheet: Overfitting and the bias-variance tradeoff

TermMeaningFix
Bias (underfit)model too simple to capture the patternmore flexible model, more features
Variance (overfit)model too sensitive to specific training samplesimpler model, more data, regularization, averaging
Irreducible noiseinherent randomness; hard floornothing

Total error = bias^2 + variance + irreducible noise.

ComplexityBiasVarianceTest error
Too simplehighlowhigh (underfit)
Sweet spotmoderatemoderateLOW
Too complexlowhighhigh (overfit)
Training errorTest errorDiagnosisTry
highhigh (similar)high bias / underfitadd complexity
lowhigh (big gap)high variance / overfitsimplify, more data, regularize
lowlow (small gap)good fitship it
TypePenaltyEffect
Ridge (L2)sum of squared coefficientsshrinks all toward zero, lowers variance
Lasso (L1)sum of absolute coefficientsshrinks; can drive some to zero (feature selection)
MethodDefault biasDefault varianceTends to
Linear / logistic regressionhighlowunderfit on complex problems
Deep unpruned treelowhighoverfit
Random forestlowlowsweet-spot (averages out variance)
Boosting (long run)lowrisingcan overfit; needs tuning
SVM (soft margin C)dialdialboth directions
PitfallReality
”Training accuracy = quality”training fit is not generalization
Only adding datafixes variance, does little for bias
Only adding complexityfixes bias, raises variance
Ignoring noise floornothing beats irreducible noise