Skip to content

References: Wisdom of crowds: random forests

Source material (conceptual spine):
• StatQuest with Josh Starmer: "Random Forests Part 1: Building, using and evaluating"
Creator: Josh Starmer
YouTube: https://www.youtube.com/watch?v=J4Wdy0Wc_xQ
Channel / site: https://statquest.org/
License: as published on StatQuest's public YouTube channel (link-out only)
Source material (hands-on companion):
• Microsoft: "ML For Beginners" (Classification module)
Repository: https://github.com/microsoft/ML-For-Beginners
License: MIT
Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos and curriculum remain
with their creators.
  • StatQuest’s “Random Forests Part 1” anchors the mechanics: building trees from bootstrap samples, combining them by vote, and evaluating with out-of-bag error. The random-feature-subset detail and the bias-variance framing of why averaging helps are built out here as the lesson’s core.
  • Microsoft’s ML-For-Beginners Classification module is the hands-on companion: it fits random-forest classifiers in Python with scikit-learn.

The “wisdom of crowds” framing, the five-tree vote example, and the explicit contrast between bagging (parallel, independent trees) and boosting (sequential trees, next lesson) are Clawdemy’s own.

  • Boosting (the next lesson). The other major way to combine trees: build them in sequence, each one correcting the errors of the last, rather than independently and in parallel.
  • Bias and variance (Phase 4). The framework that makes “averaging lowers variance” precise. The random forest is the cleanest example of trading nothing on bias for a large cut in variance.
  • Feature importance. The forest’s partial recovery of interpretability: a ranking of which features mattered most across all the trees, useful when you need some insight into a near-black-box model.

None selected for this lesson. Random forests are well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.