Skip to content

Cheatsheet: Wisdom of crowds: random forests

ItemDetail
What it isan ensemble of many decision trees
Combine (classification)majority vote
Combine (regression)average of the trees’ predictions
Why it beats one treeindependent errors cancel; shared signal survives
SourceWhat it does
Baggingeach tree trains on a bootstrap sample (random draw with replacement)
Random feature subsetseach split may use only a random subset of features
Combined effecttrees are individually decent and make mistakes in different places
ComponentSingle deep treeRandom forest
Biaslowlow (kept)
Variancehigh (unstable)sharply lowered
Generalizationoverfitsbetter on new data
TreeVote
1SPAM
2SPAM
3NOT SPAM
4SPAM
5NOT SPAM
Result3 vs 2 -> SPAM
ItemNote
Out-of-bag errorfree generalization estimate from each tree’s left-out ~1/3 of data
Lostinterpretability (cannot read hundreds of trees)
Gainedaccuracy, stability, low tuning, feature importances
More treeshelp then plateau; do not cause overfitting