Skip to content

Cheatsheet: Asking the right questions: decision trees

PartWhat it is
Rootthe first question, asked of every example
Internal nodea follow-up question
Branchthe answer to a question (yes / no)
Leafthe prediction (a class, or a number for regression trees)
StepAction
1Start at the root
2Answer the question, follow the matching branch
3Repeat at each node
4Stop at a leaf; its value is the prediction
IdeaDetail
Goal at each nodepick the question that best separates the classes
Pure groupall one class (impurity = 0)
Impure groupmixed (maximum at 50/50)
MeasureGini impurity or entropy
Rulechoose the split that most reduces impurity, then repeat on each branch

Split-quality example (parent: 4 approve, 4 deny)

Section titled “Split-quality example (parent: 4 approve, 4 deny)”
SplitResultVerdict
X(4 approve, 0 deny) and (0 approve, 4 deny)pure leaves, impurity 0, chosen
Y(2 approve, 2 deny) and (2 approve, 2 deny)still 50/50, reduces nothing
LeverEffect
Max depthstop splitting past a set depth
Min leaf sizerequire N examples in a leaf
Pruninggrow fully, then cut branches that do not help
StrengthsWeakness
Interpretable (auditable path)Unstable (high variance)
Captures non-linear patternsOverfits easily without limits
Handles mixed feature typesA small data change can reshape the whole tree
No input rescaling needed(fixed by averaging many trees: next lesson)