References: Asking the right questions: decision trees
Source material
Section titled “Source material”Source material (conceptual spine):• StatQuest with Josh Starmer: "Decision and Classification Trees, Clearly Explained!!!" Creator: Josh Starmer YouTube: https://www.youtube.com/watch?v=_L39rN6gz7Y Channel / site: https://statquest.org/ License: as published on StatQuest's public YouTube channel (link-out only)
Related StatQuest videos:• "Regression Trees, Clearly Explained" : https://www.youtube.com/watch?v=g9c66TUylZ4• "How to Prune Trees (Cost Complexity Pruning)" : https://www.youtube.com/watch?v=D0efHEJsfHo
Source material (hands-on companion):• Microsoft: "ML For Beginners" (Classification module) Repository: https://github.com/microsoft/ML-For-Beginners License: MIT
Clawdemy provides original notes, summaries, and quizzes derived from this materialfor educational purposes. All rights to the original videos and curriculum remainwith their creators.What this lesson draws from each source
Section titled “What this lesson draws from each source”- StatQuest’s “Decision and Classification Trees” anchors the core mechanics: how a tree splits, how impurity (Gini) guides the choice of question, and how trees overfit. Its companion videos cover regression trees and cost-complexity pruning, the two extensions named but not detailed here.
- Microsoft’s ML-For-Beginners Classification module is the hands-on companion: it builds tree-based classifiers in Python with scikit-learn.
The loan-approval worked example, the explicit framing of instability as the setup for random forests, and the connection to auditable decisions are Clawdemy’s own.
Going deeper
Section titled “Going deeper”- StatQuest with Josh Starmer. Beyond the classification-tree video, StatQuest covers regression trees, pruning, and (in the next lessons of this track) the ensembles built from trees. If any step here felt fast, the matching StatQuest video slows it down with pictures.
- Microsoft ML-For-Beginners: Classification. Project-based classification lessons in scikit-learn, where you can fit and visualize a real decision tree.
Adjacent topics
Section titled “Adjacent topics”- Random forests (the next lesson). The direct fix for a single tree’s instability: grow many trees on different views of the data and average their votes.
- Gradient-boosted trees. A different way to combine trees, building them in sequence so each corrects the last. The subject of the boosting lesson later in this phase.
- Bias and variance (Phase 4). The framework that makes “an unrestrained tree overfits” precise, and explains why limiting depth trades a little accuracy for a lot of stability.
Community discussion
Section titled “Community discussion”None selected for this lesson. Decision trees are thoroughly covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.