Cheatsheet: Asking the right questions: decision trees
Anatomy
Section titled “Anatomy”| Part | What it is |
|---|---|
| Root | the first question, asked of every example |
| Internal node | a follow-up question |
| Branch | the answer to a question (yes / no) |
| Leaf | the prediction (a class, or a number for regression trees) |
Making a prediction
Section titled “Making a prediction”| Step | Action |
|---|---|
| 1 | Start at the root |
| 2 | Answer the question, follow the matching branch |
| 3 | Repeat at each node |
| 4 | Stop at a leaf; its value is the prediction |
How the tree is built
Section titled “How the tree is built”| Idea | Detail |
|---|---|
| Goal at each node | pick the question that best separates the classes |
| Pure group | all one class (impurity = 0) |
| Impure group | mixed (maximum at 50/50) |
| Measure | Gini impurity or entropy |
| Rule | choose the split that most reduces impurity, then repeat on each branch |
Split-quality example (parent: 4 approve, 4 deny)
Section titled “Split-quality example (parent: 4 approve, 4 deny)”| Split | Result | Verdict |
|---|---|---|
| X | (4 approve, 0 deny) and (0 approve, 4 deny) | pure leaves, impurity 0, chosen |
| Y | (2 approve, 2 deny) and (2 approve, 2 deny) | still 50/50, reduces nothing |
Overfitting control
Section titled “Overfitting control”| Lever | Effect |
|---|---|
| Max depth | stop splitting past a set depth |
| Min leaf size | require N examples in a leaf |
| Pruning | grow fully, then cut branches that do not help |
Strengths vs weakness
Section titled “Strengths vs weakness”| Strengths | Weakness |
|---|---|
| Interpretable (auditable path) | Unstable (high variance) |
| Captures non-linear patterns | Overfits easily without limits |
| Handles mixed feature types | A small data change can reshape the whole tree |
| No input rescaling needed | (fixed by averaging many trees: next lesson) |