Cheatsheet: Turning weak learners strong: boosting
The core idea
Section titled “The core idea”| Item | Detail |
|---|---|
| Building block | a weak learner (shallow tree / stump) |
| Strategy | build in sequence; each fixes the previous errors |
| Combine | weighted, additive sum of all learners |
| Result | many weak learners add up to one strong model |
Two flavors
Section titled “Two flavors”| Flavor | How each new learner targets errors |
|---|---|
| AdaBoost | re-weight misclassified examples so the next learner focuses on them |
| Gradient boosting | train the next tree on the residuals (current error); add it, scaled by the learning rate |
Residual trace (true value 50, start 40)
Section titled “Residual trace (true value 50, start 40)”| Step | Adds | Prediction | Residual |
|---|---|---|---|
| start | — | 40 | +10 |
| tree 1 | +6 | 46 | +4 |
| tree 2 | +3 | 49 | +1 |
Bagging vs boosting
Section titled “Bagging vs boosting”| Random forest (bagging) | Boosting | |
|---|---|---|
| Trees built | independently, parallel | sequentially |
| Each tree | deep, full-grown | weak / shallow |
| Each tree’s job | own best guess | fix current errors |
| Combine | vote / average | weighted additive sum |
| Mainly reduces | variance | bias |
| Overfit by adding trees | no (plateaus) | yes (needs tuning) |
Pitfalls
Section titled “Pitfalls”| Pitfall | Reality |
|---|---|
| Confusing it with bagging | bagging = parallel/deep; boosting = sequential/weak |
| More rounds always help | boosting can overfit; watch test error |
| Ignoring the learning rate | too high overfits; too low wastes effort |
| Where in the wild | XGBoost / LightGBM / CatBoost dominate tabular data |