Summary: Turning weak learners strong: boosting
Boosting builds trees one at a time, in sequence, each trained to fix the errors the previous ones still make, turning a chain of weak learners into one strong model. It is the opposite of the random forest’s parallel, independent averaging, and on tabular data it often produces the most accurate model of all. This summary is the scan version of the full lesson.
Core ideas
Section titled “Core ideas”- Weak learners, chained. The building block is a weak learner (a shallow tree, often a single-split stump) that does just better than chance. Boosting chains many of them so each patches the errors left by the ones before.
- AdaBoost re-weights the data. After each learner, it raises the weight of the examples that learner got wrong so the next one focuses there, and it weights each learner’s vote by its accuracy.
- Gradient boosting chases residuals. Each new tree predicts the leftover error (the gap between the current prediction and the truth); adding it (scaled by a learning rate) shrinks the error. This is gradient descent with trees as the steps.
- The learning rate scales each tree’s contribution down so the ensemble takes many small steps.
- Boosting can overfit. Unlike a forest, adding too many trees or too large a learning rate makes boosting fit noise. It needs careful tuning.
- Bagging vs boosting: a forest averages strong, independent trees in parallel to cut variance; boosting chains weak, dependent trees in sequence to cut bias.
What changes for you
Section titled “What changes for you”When you hear that “for tabular data, gradient boosting still beats deep learning,” you now know exactly what that means: XGBoost, LightGBM, and CatBoost, all gradient-boosted trees, are the reigning champions on spreadsheet-shaped data, from fraud detection to search ranking. You also gain a diagnostic instinct: a forest and a boosted model are not interchangeable, they fix opposite problems, so knowing whether you are fighting variance or bias tells you which to reach for, a judgment Phase 4 makes precise. And the recurring theme holds: gradient boosting is gradient descent yet again, the same engine from lesson 3, taking steps that happen to be whole trees. The next lesson leaves ensembles behind and returns to drawing a single boundary, but with a sharp new principle: maximize the margin. That is the support vector machine.