Practice: Asking the right questions: decision trees

Self-check

Seven short questions. Try to answer each one before opening the collapsible.

1. What are the three kinds of parts in a decision tree?

Show answer

The root (the first question, asked of every example), internal nodes (follow-up questions), and leaves (the predictions: a class label, or a number for a regression tree).

2. How does a tree make a prediction for one example?

Show answer

Start at the root, answer its question, follow the matching branch to the next question, and keep going until you reach a leaf. That leaf’s value is the prediction. No arithmetic, just a path of questions.

3. When the tree is built, what makes one candidate question better than another?

Show answer

How much it reduces impurity: how well it separates the classes into purer groups. A split that sends all of one class one way and all of the other class the other way is ideal; a split that leaves both branches mixed is useless.

4. What do “pure” and “impure” mean, and what measures them?

Show answer

A group is pure if it is all one class and impure if it is mixed. Gini impurity or entropy measures it: 0 for perfectly pure, maximum for a 50/50 mix. The tree picks the split that most reduces this number.

5. What happens if you let a tree grow without any limit?

Show answer

It keeps splitting until each leaf holds a single training example, perfectly fitting the training data and memorizing its noise. That is overfitting. Trees are reined in with depth limits, minimum leaf sizes, or pruning.

6. Name two strengths and the key weakness of a single decision tree.

Show answer

Strengths (any two): interpretable, captures non-linear patterns, handles mixed feature types, needs no rescaling. Key weakness: instability (high variance), a small change in the data can produce a very different tree, and it overfits easily.

7. What problem with single trees does the next lesson (random forests) set out to fix?

Show answer

Their instability. Because one tree swings wildly with small data changes, the fix is to grow many trees on slightly different views of the data and average their votes, which cancels out the variance.

Try it yourself: trace the tree

Here is a small animal classifier. Trace each of the three animals down to its leaf.

                 [ Has feathers? ]
                  /             \
                yes              no
                /                  \
        [ Can fly? ]           [ Has fins? ]
         /        \              /         \
       yes        no           yes          no
       /            \          /             \
    BIRD         PENGUIN     FISH          MAMMAL

Animals:
  1. feathers: yes, can fly: no
  2. feathers: no,  has fins: yes
  3. feathers: no,  has fins: no

Show answer

1. Has feathers? yes -> Can fly? no  -> PENGUIN
2. Has feathers? no  -> Has fins? yes -> FISH
3. Has feathers? no  -> Has fins? no  -> MAMMAL

Each animal follows a path of questions to a single leaf. Notice you never needed the questions on the branches you did not take: animal 2 was never asked “can fly?” because it has no feathers.

Try it yourself: which split is better?

A node holds 8 examples: 4 “approve” and 4 “deny”. Two candidate questions would split them like this:

Split X:  left -> (4 approve, 0 deny)    right -> (0 approve, 4 deny)
Split Y:  left -> (2 approve, 2 deny)    right -> (2 approve, 2 deny)

Which split does the tree choose, and why?

Show answer

The tree chooses Split X. Its two groups are perfectly pure (one is all approve, the other all deny), so impurity drops to 0, the best possible separation. Split Y leaves both branches at 50/50, exactly as mixed as the parent, so it reduces impurity by nothing and teaches the tree nothing. The whole tree-building rule is just “pick the split that most reduces impurity,” and Split X reduces it completely while Split Y does not reduce it at all.

Flashcards

Ten cards. Click any card to reveal the answer. Use the Print flashcards button for one card per page.

Q. What are the three parts of a decision tree?

The root (first question), internal nodes (follow-up questions), and leaves (predictions: a class, or a number for regression trees).

Q. How does a tree predict for one example?

Start at the root, answer each question, follow the matching branch, and stop at the leaf you reach. That leaf is the prediction.

Q. What makes one split better than another when building a tree?

How much it reduces impurity: how well it separates the classes into purer groups. The tree picks the split that purifies the most.

Q. What do pure and impure mean?

Pure = all one class (impurity 0). Impure = mixed (maximum at 50/50). Gini impurity or entropy puts a number on it.

Q. Why does an unrestrained tree overfit?

It keeps splitting until each leaf is a single training example, memorizing noise instead of learning the pattern. It fits training data perfectly and fails on new data.

Q. How are trees kept from overfitting?

Limit the depth, require a minimum number of examples per leaf, or grow the tree fully and prune branches that do not earn their keep.

Q. Name two strengths of decision trees.

Any two: interpretable (you can read and explain the path), captures non-linear patterns, handles mixed feature types, needs no input rescaling.

Q. What is the key weakness of a single tree?

Instability (high variance): a small change in the data can produce a very different tree, and it overfits easily.

Q. What is a regression tree?

A decision tree that predicts a number instead of a class. Each leaf outputs the average of the training values that land in it.

Q. Why do decision trees matter beyond this lesson?

They are the building block of random forests and gradient-boosted trees, the workhorse models for tabular data, and they give a fully auditable decision path.