Skip to content

References: Building a hierarchy: hierarchical clustering

Source material (conceptual spine):
• StatQuest with Josh Starmer: "Hierarchical Clustering"
Creator: Josh Starmer
YouTube: https://www.youtube.com/watch?v=7xHsRkOdVwo
Channel / site: https://statquest.org/
License: as published on StatQuest's public YouTube channel (link-out only)
Source material (hands-on companion):
• Microsoft: "ML For Beginners" (Clustering module)
Repository: https://github.com/microsoft/ML-For-Beginners
License: MIT
Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos and curriculum remain
with their creators.
  • StatQuest’s “Hierarchical Clustering” anchors the bottom-up merging process, the dendrogram, and how merge height encodes distance. The cut-the-tree examples, the tallest-gap heuristic, and the left-right-is-meaningless warning are built out here as the core reading skills.
  • Microsoft’s ML-For-Beginners Clustering module is the hands-on companion for running clustering in scikit-learn.

The worked ASCII dendrogram and the explicit comparison table with k-means are Clawdemy’s own.

  • StatQuest with Josh Starmer. The hierarchical clustering video, and StatQuest’s clustering and heatmap material, which shows dendrograms in their natural habitat alongside heatmaps.
  • Microsoft ML-For-Beginners: Clustering. Project-based clustering lessons in scikit-learn, including building and cutting dendrograms with SciPy.
  • Principal component analysis (the next lesson). The other major unsupervised goal: not grouping data but compressing it, reducing many features to a few that capture most of the variation.
  • Linkage methods in depth. Single, complete, average, and Ward’s linkage each shape clusters differently; worth exploring when a dendrogram looks wrong for your data.
  • Heatmaps with dendrograms. The classic bioinformatics visualization, where row and column dendrograms order a heatmap to reveal blocks of similar genes and samples.

None selected for this lesson. Hierarchical clustering is well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.