Asking the right questions: decision trees
What you’ll learn
Section titled “What you’ll learn”This is lesson 5 of Track 10, in Phase 2 (Teaching a machine to decide). By the end you will be able to read a decision tree, trace any example down its branches to a prediction, and explain how the tree was built by repeatedly choosing the question that best separates the classes. The one capability to walk away with: given a small tree, predict what it does for a new example and say why each split was chosen, in plain language.
The track structurally mirrors StatQuest’s intuition-first machine learning videos, with Microsoft’s “ML For Beginners” as the hands-on companion for readers who want to build the models in code. Full attribution is in this lesson’s references.
Where this fits
Section titled “Where this fits”Logistic regression, the previous lesson, classifies by drawing a single straight boundary. The decision tree is the first model in this track to take a fundamentally different shape: a flowchart of questions that can carve out non-linear regions a straight line never could. It is also the building block for the next two lessons. Random forests (next) combine many trees to fix a single tree’s instability, and boosting combines them a different way. So this lesson is the foundation of the whole ensemble run in Phase 2.
Before you start
Section titled “Before you start”Prerequisite: Lesson 4, From a line to a probability: logistic regression. You need the idea of a classifier (predicting a category), because this lesson contrasts the tree’s question-asking approach with logistic regression’s single straight boundary. No math beyond comparing numbers; a tree’s questions are simple thresholds.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Read a tree’s anatomy and trace an example to its prediction
- Explain how a tree is built by reducing impurity at each split
- Define pure and impure groups and what Gini impurity or entropy measures
- Explain why unrestrained trees overfit and how to prevent it
- Identify a single tree’s instability as the weakness random forests fix
Time and difficulty
Section titled “Time and difficulty”- Read time: about 11 minutes
- Practice time: about 15 minutes (a tree-tracing exercise, a split-quality comparison, and flashcards)
- Difficulty: standard