Wisdom of crowds: random forests
What you’ll learn
Section titled “What you’ll learn”This is lesson 6 of Track 10, in Phase 2 (Teaching a machine to decide). By the end you will be able to explain how a random forest combines many decision trees to fix a single tree’s instability, and why a crowd of individually-overfit trees generalizes better than any one of them. The one capability to walk away with: explain how bagging many trees reduces overfitting, and compute a forest’s prediction by vote or average.
The track structurally mirrors StatQuest’s intuition-first machine learning videos, with Microsoft’s “ML For Beginners” as the hands-on companion for readers who want to build the models in code. Full attribution is in this lesson’s references.
Where this fits
Section titled “Where this fits”The previous lesson ended on the decision tree’s fatal flaw: a single tree is unstable and overfits. This lesson is the fix, and it is the first ensemble in the track, combining many models into one. It also sets up a contrast that runs through the rest of the phase: a random forest combines trees in parallel (bagging), while the next lesson, boosting, combines them in sequence. Both are tree ensembles; understanding how they differ is the payoff of these two lessons.
Before you start
Section titled “Before you start”Prerequisite: Lesson 5, Asking the right questions: decision trees. A random forest is built entirely out of decision trees, so you need to know what one tree is, how it splits, and why a single tree overfits and is unstable, the exact problem this lesson solves.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Describe a random forest as an ensemble of trees that votes or averages
- Explain bagging and random feature subsets, and why diversity is essential
- Explain why averaging independent errors lowers variance
- Compute a forest’s prediction by majority vote and by average
- Name the interpretability trade and the out-of-bag error bonus
Time and difficulty
Section titled “Time and difficulty”- Read time: about 12 minutes
- Practice time: about 15 minutes (a vote-and-average exercise, a remove-the-randomness thought experiment, and flashcards)
- Difficulty: standard