References: Building a hierarchy: hierarchical clustering
Source material
Section titled “Source material”Source material (conceptual spine):• StatQuest with Josh Starmer: "Hierarchical Clustering" Creator: Josh Starmer YouTube: https://www.youtube.com/watch?v=7xHsRkOdVwo Channel / site: https://statquest.org/ License: as published on StatQuest's public YouTube channel (link-out only)
Source material (hands-on companion):• Microsoft: "ML For Beginners" (Clustering module) Repository: https://github.com/microsoft/ML-For-Beginners License: MIT
Clawdemy provides original notes, summaries, and quizzes derived from this materialfor educational purposes. All rights to the original videos and curriculum remainwith their creators.What this lesson draws from each source
Section titled “What this lesson draws from each source”- StatQuest’s “Hierarchical Clustering” anchors the bottom-up merging process, the dendrogram, and how merge height encodes distance. The cut-the-tree examples, the tallest-gap heuristic, and the left-right-is-meaningless warning are built out here as the core reading skills.
- Microsoft’s ML-For-Beginners Clustering module is the hands-on companion for running clustering in scikit-learn.
The worked ASCII dendrogram and the explicit comparison table with k-means are Clawdemy’s own.
Going deeper
Section titled “Going deeper”- StatQuest with Josh Starmer. The hierarchical clustering video, and StatQuest’s clustering and heatmap material, which shows dendrograms in their natural habitat alongside heatmaps.
- Microsoft ML-For-Beginners: Clustering. Project-based clustering lessons in scikit-learn, including building and cutting dendrograms with SciPy.
Adjacent topics
Section titled “Adjacent topics”- Principal component analysis (the next lesson). The other major unsupervised goal: not grouping data but compressing it, reducing many features to a few that capture most of the variation.
- Linkage methods in depth. Single, complete, average, and Ward’s linkage each shape clusters differently; worth exploring when a dendrogram looks wrong for your data.
- Heatmaps with dendrograms. The classic bioinformatics visualization, where row and column dendrograms order a heatmap to reveal blocks of similar genes and samples.
Community discussion
Section titled “Community discussion”None selected for this lesson. Hierarchical clustering is well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.