References: Wisdom of crowds: random forests

Source material

Source material (conceptual spine):
• StatQuest with Josh Starmer: "Random Forests Part 1: Building, using and evaluating"
  Creator: Josh Starmer
  YouTube: https://www.youtube.com/watch?v=J4Wdy0Wc_xQ
  Channel / site: https://statquest.org/
  License: as published on StatQuest's public YouTube channel (link-out only)

Source material (hands-on companion):
• Microsoft: "ML For Beginners" (Classification module)
  Repository: https://github.com/microsoft/ML-For-Beginners
  License: MIT

Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos and curriculum remain
with their creators.

What this lesson draws from each source

StatQuest’s “Random Forests Part 1” anchors the mechanics: building trees from bootstrap samples, combining them by vote, and evaluating with out-of-bag error. The random-feature-subset detail and the bias-variance framing of why averaging helps are built out here as the lesson’s core.
Microsoft’s ML-For-Beginners Classification module is the hands-on companion: it fits random-forest classifiers in Python with scikit-learn.

The “wisdom of crowds” framing, the five-tree vote example, and the explicit contrast between bagging (parallel, independent trees) and boosting (sequential trees, next lesson) are Clawdemy’s own.

Going deeper

StatQuest with Josh Starmer. Random Forests Part 1 plus a Part 2 on missing data. StatQuest also has clear standalone explainers on bagging and on the bias-variance tradeoff that this lesson leans on.
Microsoft ML-For-Beginners: Classification. Project-based lessons where you fit and evaluate a random forest on real data in scikit-learn.

Adjacent topics

Boosting (the next lesson). The other major way to combine trees: build them in sequence, each one correcting the errors of the last, rather than independently and in parallel.
Bias and variance (Phase 4). The framework that makes “averaging lowers variance” precise. The random forest is the cleanest example of trading nothing on bias for a large cut in variance.
Feature importance. The forest’s partial recovery of interpretability: a ranking of which features mattered most across all the trees, useful when you need some insight into a near-black-box model.

Community discussion

None selected for this lesson. Random forests are well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.