Cheatsheet: Overfitting and the bias-variance tradeoff
The two failure modes
Section titled “The two failure modes”| Term | Meaning | Fix |
|---|---|---|
| Bias (underfit) | model too simple to capture the pattern | more flexible model, more features |
| Variance (overfit) | model too sensitive to specific training sample | simpler model, more data, regularization, averaging |
| Irreducible noise | inherent randomness; hard floor | nothing |
Total error = bias^2 + variance + irreducible noise.
The U-curve
Section titled “The U-curve”| Complexity | Bias | Variance | Test error |
|---|---|---|---|
| Too simple | high | low | high (underfit) |
| Sweet spot | moderate | moderate | LOW |
| Too complex | low | high | high (overfit) |
The diagnostic (the capability)
Section titled “The diagnostic (the capability)”| Training error | Test error | Diagnosis | Try |
|---|---|---|---|
| high | high (similar) | high bias / underfit | add complexity |
| low | high (big gap) | high variance / overfit | simplify, more data, regularize |
| low | low (small gap) | good fit | ship it |
Regularization
Section titled “Regularization”| Type | Penalty | Effect |
|---|---|---|
| Ridge (L2) | sum of squared coefficients | shrinks all toward zero, lowers variance |
| Lasso (L1) | sum of absolute coefficients | shrinks; can drive some to zero (feature selection) |
Where each method sits
Section titled “Where each method sits”| Method | Default bias | Default variance | Tends to |
|---|---|---|---|
| Linear / logistic regression | high | low | underfit on complex problems |
| Deep unpruned tree | low | high | overfit |
| Random forest | low | low | sweet-spot (averages out variance) |
| Boosting (long run) | low | rising | can overfit; needs tuning |
| SVM (soft margin C) | dial | dial | both directions |
Pitfalls
Section titled “Pitfalls”| Pitfall | Reality |
|---|---|
| ”Training accuracy = quality” | training fit is not generalization |
| Only adding data | fixes variance, does little for bias |
| Only adding complexity | fixes bias, raises variance |
| Ignoring noise floor | nothing beats irreducible noise |