Summary: Wisdom of crowds: random forests
A random forest fixes the single tree’s instability with the wisdom of crowds: grow hundreds of trees, each on a slightly different slice of the data and features, and let them vote. The trees’ independent errors cancel out under averaging, keeping the shared signal and sharply lowering variance, which is why the crowd generalizes far better than any one overfit tree. This summary is the scan version of the full lesson.
Core ideas
Section titled “Core ideas”- A forest is an ensemble of decision trees. To predict, run the example through every tree and combine: majority vote for classification, average for regression.
- Diversity is the whole point. A crowd of identical trees is no wiser than one. The forest manufactures difference two ways.
- Bagging: each tree trains on its own bootstrap sample (a random sample drawn with replacement), so every tree sees slightly different data.
- Random feature subsets: each split may use only a random subset of features, so no single strong feature makes all the trees identical.
- Why it works: each tree still overfits, but their errors are independent. Trees agree on the real signal and disagree randomly on noise; averaging keeps the signal and cancels the noise. In bias-variance terms, low bias is kept and variance is slashed.
- Free bonus: out-of-bag error, a generalization estimate from testing each tree on the data its bootstrap left out.
- The trade: you lose the single tree’s interpretability (a forest is near black-box) and gain accuracy and stability. Feature importances survive; the readable decision path does not.
What changes for you
Section titled “What changes for you”The random forest is the model a data scientist often reaches for first on spreadsheet-shaped data: accurate out of the box, hard to overfit, little tuning. Knowing how it works demystifies why “just throw a random forest at it” is such common advice, and why it is a strong baseline that fancier models must beat to earn their place. The bigger lesson is the wisdom-of-crowds principle itself: combining many diverse, independently-wrong models beats any single one, an idea that reappears across machine learning, including averaging several runs of a large model for a steadier answer. The next lesson keeps the many-trees idea but combines them in sequence rather than in parallel, each tree fixing the last one’s mistakes, which is boosting.