Skip to content

References: Asking the right questions: decision trees

Source material (conceptual spine):
• StatQuest with Josh Starmer: "Decision and Classification Trees, Clearly Explained!!!"
Creator: Josh Starmer
YouTube: https://www.youtube.com/watch?v=_L39rN6gz7Y
Channel / site: https://statquest.org/
License: as published on StatQuest's public YouTube channel (link-out only)
Related StatQuest videos:
• "Regression Trees, Clearly Explained" : https://www.youtube.com/watch?v=g9c66TUylZ4
• "How to Prune Trees (Cost Complexity Pruning)" : https://www.youtube.com/watch?v=D0efHEJsfHo
Source material (hands-on companion):
• Microsoft: "ML For Beginners" (Classification module)
Repository: https://github.com/microsoft/ML-For-Beginners
License: MIT
Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos and curriculum remain
with their creators.
  • StatQuest’s “Decision and Classification Trees” anchors the core mechanics: how a tree splits, how impurity (Gini) guides the choice of question, and how trees overfit. Its companion videos cover regression trees and cost-complexity pruning, the two extensions named but not detailed here.
  • Microsoft’s ML-For-Beginners Classification module is the hands-on companion: it builds tree-based classifiers in Python with scikit-learn.

The loan-approval worked example, the explicit framing of instability as the setup for random forests, and the connection to auditable decisions are Clawdemy’s own.

  • StatQuest with Josh Starmer. Beyond the classification-tree video, StatQuest covers regression trees, pruning, and (in the next lessons of this track) the ensembles built from trees. If any step here felt fast, the matching StatQuest video slows it down with pictures.
  • Microsoft ML-For-Beginners: Classification. Project-based classification lessons in scikit-learn, where you can fit and visualize a real decision tree.
  • Random forests (the next lesson). The direct fix for a single tree’s instability: grow many trees on different views of the data and average their votes.
  • Gradient-boosted trees. A different way to combine trees, building them in sequence so each corrects the last. The subject of the boosting lesson later in this phase.
  • Bias and variance (Phase 4). The framework that makes “an unrestrained tree overfits” precise, and explains why limiting depth trades a little accuracy for a lot of stability.

None selected for this lesson. Decision trees are thoroughly covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.