Practice: Wisdom of crowds: random forests

Self-check

Seven short questions. Try to answer each one before opening the collapsible.

1. What is a random forest, in one sentence?

Show answer

An ensemble of many decision trees that combines their answers, taking the majority vote for classification or the average for regression. It is the wisdom of a crowd of trees.

2. What is bagging?

Show answer

Bootstrap aggregating: training each tree on its own bootstrap sample, a random sample of the data drawn with replacement (some rows repeat, some are missing), so every tree sees a slightly different dataset.

3. What is the second source of randomness, and why is it needed?

Show answer

At each split, a tree may consider only a random subset of the features. Without it, one strongly predictive feature would dominate the first split in nearly every tree, making the trees almost identical. Random feature subsets de-correlate the trees.

4. How can averaging a pile of overfit trees beat any single one?

Show answer

Because the trees’ errors are largely independent. They agree on the real signal (it is in everyone’s data) and disagree randomly on the noise. Averaging reinforces the signal and cancels the scattered errors, keeping the signal and discarding the noise.

5. In bias-variance terms, what does a forest change relative to one deep tree?

Show answer

A single deep tree has low bias but high variance (it is unstable). Averaging many such trees keeps the low bias but sharply lowers the variance, which improves generalization to new data.

6. What is out-of-bag error?

Show answer

Because each tree skips about a third of the data (its bootstrap sample left those out), you can test each tree on the examples it never saw and average the results, getting a free generalization estimate without a separate validation set.

7. What do you give up by using a forest instead of one tree?

Show answer

Interpretability. You can read one tree as a flowchart and explain its decision; you cannot read hundreds of voting trees. A forest is close to a black box (though it still gives feature-importance estimates). It is also larger and slower.

Try it yourself: vote and average

Part A (classification). Seven trees vote on whether to approve a loan: approve, approve, deny, approve, deny, approve, approve. What does the forest predict?

Part B (regression). A four-tree forest predicts a delivery time, in minutes: 20, 25, 22, 25. What does the forest predict?

Show answer

Part A: count the votes: 5 approve, 2 deny. Majority wins, so the forest predicts approve. Two trees disagreed, but the crowd carried the decision.

Part B: average the predictions: (20 + 25 + 22 + 25) / 4 = 92 / 4 = 23 minutes. Classification forests vote; regression forests average.

Try it yourself: remove the randomness

Suppose you build a “forest” but switch off both sources of randomness: every tree is trained on the full dataset, and every split may use all features. What kind of forest do you get, and why is it pointless?

Show answer

You get a forest of nearly identical trees. With the same data and the same features available, the greedy build process makes (almost) the same splits every time, so every tree is essentially a copy of the others. Voting among identical trees just returns what a single tree would say, so all the extra trees buy you nothing. The diversity created by bagging and random feature subsets is exactly what makes the crowd wiser than the individual. Without it, the crowd is one tree wearing a costume.

Flashcards

Ten cards. Click any card to reveal the answer. Use the Print flashcards button for one card per page.

Q. What is a random forest?

An ensemble of many decision trees that combines their answers: majority vote for classification, average for regression.

Q. What is bagging (bootstrap aggregating)?

Training each tree on its own bootstrap sample: a random sample drawn with replacement, so each tree sees a slightly different dataset.

Q. What is the second source of randomness in a forest?

Random feature subsets: each split may consider only a random subset of features, which de-correlates the trees so they are not all dominated by one strong feature.

Q. Why does averaging overfit trees help?

Their errors are largely independent. Trees agree on the real signal and disagree randomly on noise; averaging reinforces the signal and cancels the scattered errors.

Q. What does a forest change in bias-variance terms?

It keeps the trees’ low bias but sharply lowers their variance, so it generalizes better than any single unstable deep tree.

Q. What is out-of-bag error?

A free generalization estimate: test each tree on the ~1/3 of data its bootstrap sample left out, and average. No separate validation set needed.

Q. What do you trade away with a random forest?

Interpretability (you cannot read hundreds of voting trees), plus more size and slower speed. It is close to a black box, though feature importances survive.

Q. Does adding more trees cause overfitting?

No, not the way deepening one tree does. More trees mostly improve and then plateau; you cannot overfit a forest just by making it larger.

Q. What happens if you remove the randomness from a forest?

The trees become nearly identical, so the forest is no better than a single tree. Diversity is what makes the crowd wise.

Q. How does a random forest combine trees, versus boosting?

A forest builds trees independently and in parallel, then averages (bagging). Boosting builds trees in sequence, each correcting the last.