Skip to content

Asking the right questions: decision trees

This is lesson 5 of Track 10, in Phase 2 (Teaching a machine to decide). By the end you will be able to read a decision tree, trace any example down its branches to a prediction, and explain how the tree was built by repeatedly choosing the question that best separates the classes. The one capability to walk away with: given a small tree, predict what it does for a new example and say why each split was chosen, in plain language.

The track structurally mirrors StatQuest’s intuition-first machine learning videos, with Microsoft’s “ML For Beginners” as the hands-on companion for readers who want to build the models in code. Full attribution is in this lesson’s references.

Logistic regression, the previous lesson, classifies by drawing a single straight boundary. The decision tree is the first model in this track to take a fundamentally different shape: a flowchart of questions that can carve out non-linear regions a straight line never could. It is also the building block for the next two lessons. Random forests (next) combine many trees to fix a single tree’s instability, and boosting combines them a different way. So this lesson is the foundation of the whole ensemble run in Phase 2.

Prerequisite: Lesson 4, From a line to a probability: logistic regression. You need the idea of a classifier (predicting a category), because this lesson contrasts the tree’s question-asking approach with logistic regression’s single straight boundary. No math beyond comparing numbers; a tree’s questions are simple thresholds.

  • Read a tree’s anatomy and trace an example to its prediction
  • Explain how a tree is built by reducing impurity at each split
  • Define pure and impure groups and what Gini impurity or entropy measures
  • Explain why unrestrained trees overfit and how to prevent it
  • Identify a single tree’s instability as the weakness random forests fix
  • Read time: about 11 minutes
  • Practice time: about 15 minutes (a tree-tracing exercise, a split-quality comparison, and flashcards)
  • Difficulty: standard