Skip to content

Cheatsheet: Building a hierarchy: hierarchical clustering

StepAction
1every point is its own cluster
2merge the two closest clusters
3repeat until one cluster remains
outputthe full merge history (a tree), no k chosen up front
FeatureMeaning
Leaves (bottom)individual points
A merge connectortwo clusters joining
Height of a mergedistance between them when merged
Low mergevery similar
High mergequite different
Left-right ordermeaningless (branches rotate freely)
CutResult
Horizontal linebranches crossed = number of clusters
Cut lowmany small, tight clusters
Cut highfew broad clusters
Best cutacross the tallest gap (no merges) = most natural split
TypeDistance used
Singlethe two nearest points (can chain)
Completethe two farthest points (compact)
Averagemean over all pairs
Ward’sleast increase in within-cluster spread
K-meansHierarchical
Choose k firstyesno (cut later)
Outputflat groupsa tree (dendrogram)
Multi-scale viewnoyes
Speed / scalefast, scalableslower, poor on large data