Skip to content

Wisdom of crowds: random forests

This is lesson 6 of Track 10, in Phase 2 (Teaching a machine to decide). By the end you will be able to explain how a random forest combines many decision trees to fix a single tree’s instability, and why a crowd of individually-overfit trees generalizes better than any one of them. The one capability to walk away with: explain how bagging many trees reduces overfitting, and compute a forest’s prediction by vote or average.

The track structurally mirrors StatQuest’s intuition-first machine learning videos, with Microsoft’s “ML For Beginners” as the hands-on companion for readers who want to build the models in code. Full attribution is in this lesson’s references.

The previous lesson ended on the decision tree’s fatal flaw: a single tree is unstable and overfits. This lesson is the fix, and it is the first ensemble in the track, combining many models into one. It also sets up a contrast that runs through the rest of the phase: a random forest combines trees in parallel (bagging), while the next lesson, boosting, combines them in sequence. Both are tree ensembles; understanding how they differ is the payoff of these two lessons.

Prerequisite: Lesson 5, Asking the right questions: decision trees. A random forest is built entirely out of decision trees, so you need to know what one tree is, how it splits, and why a single tree overfits and is unstable, the exact problem this lesson solves.

  • Describe a random forest as an ensemble of trees that votes or averages
  • Explain bagging and random feature subsets, and why diversity is essential
  • Explain why averaging independent errors lowers variance
  • Compute a forest’s prediction by majority vote and by average
  • Name the interpretability trade and the out-of-bag error bonus
  • Read time: about 12 minutes
  • Practice time: about 15 minutes (a vote-and-average exercise, a remove-the-randomness thought experiment, and flashcards)
  • Difficulty: standard