Practice: Building a hierarchy: hierarchical clustering
Self-check
Section titled “Self-check”Seven short questions. Try to answer each one before opening the collapsible.
1. How does hierarchical clustering avoid k-means’ “choose k first” demand?
Show answer
It builds a full tree of nested clusters, from every point alone up to one big cluster. You do not pick the number of clusters in advance; you choose where to cut the tree afterward.
2. Describe the agglomerative (bottom-up) procedure.
Show answer
Start with every point as its own cluster. Repeatedly merge the two closest clusters into one. Continue until everything is in a single cluster, recording each merge and its distance.
3. In a dendrogram, what does the height of a merge represent?
Show answer
The distance between the two clusters when they merged. Low merges mean very similar; high merges mean the joined groups were quite different.
4. How do you turn a dendrogram into a specific set of clusters?
Show answer
Cut it with a horizontal line. The number of branches the line crosses is the number of clusters, and each branch below a crossing is one cluster. Cut low for many small clusters, high for few large ones.
5. Where is the most natural place to cut, and why?
Show answer
Across the tallest vertical gap, a long stretch where no merges happen. That gap means the groups on either side are genuinely far apart, so the data itself is suggesting the split.
6. What is linkage, and name two types.
Show answer
Linkage is how you measure the distance between two clusters (not just two points). Types (any two): single (nearest points), complete (farthest points), average (mean of all pairs), Ward’s (least increase in spread).
7. Why is a leaf’s left-right position in a dendrogram not a measure of similarity?
Show answer
Because branches can be rotated freely without changing the tree’s meaning. Only the height at which two items merge tells you how similar they are, not how close they happen to be drawn along the bottom.
Try it yourself: read and cut the dendrogram
Section titled “Try it yourself: read and cut the dendrogram”height 7 | _________________ | | | 3 | _|_ | | | | | 1 | | _|_ | | | | | | 0 | P Q R SAnswer three questions:
- Which two points are the most similar?
- Cut at height 5. How many clusters, and what are they?
- Cut at height 2. How many clusters, and what are they?
Show answer
- Q and R are most similar: they merge lowest, at height 1.
- Cut at height 5 (between 3 and 7) crosses 2 branches: {P, Q, R} and {S}.
- Cut at height 2 (between 1 and 3) crosses 3 branches: {P}, {Q, R}, and {S} (P has not yet joined Q-R, which happens at height 3).
The tallest gap is between height 3 and 7, so the two-cluster cut (question 2) is the most natural grouping: S is genuinely far from the others.
Try it yourself: spot the misreading
Section titled “Try it yourself: spot the misreading”A colleague looks at the dendrogram above and says: “P is drawn right next to Q, so P and Q must be the most similar pair.” Are they right? Explain.
Show answer
No. Horizontal adjacency in a dendrogram means nothing, because the branches can be rotated freely without changing what the tree says. The only thing that signals similarity is the height at which two items merge. Here Q and R merge at height 1 (most similar), while P does not join them until height 3. So Q and R are the closest pair, even though P happens to be drawn beside Q. Always read the merge heights, never the left-to-right order.
Flashcards
Section titled “Flashcards”Ten cards. Click any card to reveal the answer. Use the Print flashcards button for one card per page.
Q. How does hierarchical clustering differ from k-means on choosing k?
It does not require k in advance. It builds a full tree of nested clusters; you choose how many by cutting the tree afterward.
Q. What is the agglomerative procedure?
Start with every point as its own cluster, repeatedly merge the two closest clusters, and continue until everything is one cluster.
Q. What is a dendrogram?
The tree diagram of the merge history: leaves are points, and each merge is drawn at a height equal to the distance between the clusters it joined.
Q. What does merge height mean in a dendrogram?
How far apart the two merged clusters were. Low merges = very similar; high merges = quite different.
Q. How do you get clusters from a dendrogram?
Cut it with a horizontal line. The number of branches crossed is the number of clusters; each branch below a crossing is one cluster.
Q. Where is the most natural cut?
Across the tallest vertical gap, where no merges happen. That gap means the groups on either side are genuinely far apart.
Q. What is linkage?
How the distance between two clusters (not just two points) is measured: single (nearest), complete (farthest), average, or Ward’s. The choice shapes the tree.
Q. Why is left-right position in a dendrogram meaningless?
Branches can be rotated freely without changing the tree. Only merge height signals similarity, not how close two leaves are drawn.
Q. When should you prefer hierarchical clustering over k-means?
On smaller datasets, when you do not know how many clusters to expect, or when the multi-scale structure (the tree itself) is what you want to see.
Q. What is a downside of hierarchical clustering?
It is computationally heavy and does not scale to very large datasets, and its greedy merges are irreversible.