References: Grouping without labels: k-means clustering
Source material
Section titled “Source material”Source material (conceptual spine):• StatQuest with Josh Starmer: "K-Means Clustering" Creator: Josh Starmer YouTube: https://www.youtube.com/watch?v=4b5d3muPQmA Channel / site: https://statquest.org/ License: as published on StatQuest's public YouTube channel (link-out only)
Source material (hands-on companion):• Microsoft: "ML For Beginners" (Clustering module) Repository: https://github.com/microsoft/ML-For-Beginners License: MIT
Clawdemy provides original notes, summaries, and quizzes derived from this materialfor educational purposes. All rights to the original videos and curriculum remainwith their creators.What this lesson draws from each source
Section titled “What this lesson draws from each source”- StatQuest’s “K-Means Clustering” anchors the assign-update loop, the role of the centroid, and the elbow method for choosing k. The by-hand line trace and the explicit “always returns k clusters even on noise” caution are built out here as the lesson’s core honesty.
- Microsoft’s ML-For-Beginners Clustering module is the hands-on companion: it runs k-means in Python with scikit-learn on real data, including the feature-scaling step this lesson flags.
The number-line worked example, the recovery-from-bad-init framing, and the connection to clustering language-model embeddings are Clawdemy’s own.
Going deeper
Section titled “Going deeper”- StatQuest with Josh Starmer. The k-means video plus related clustering explainers. StatQuest is especially good on the elbow method and on why initialization matters.
- Microsoft ML-For-Beginners: Clustering. Project-based clustering lessons in scikit-learn, where you can run k-means and visualize the result.
Adjacent topics
Section titled “Adjacent topics”- Hierarchical clustering (the next lesson). Clusters without committing to a number of groups, building a tree of nested clusters that shows structure at every scale.
- k-means++. The standard smarter initialization that spreads the starting centroids out, reducing the chance of a poor local solution. A practical companion to the init-sensitivity noted here.
- DBSCAN. A density-based clustering method that finds clusters of arbitrary shape and does not need k chosen in advance, addressing two of k-means’ limitations. Worth knowing exists, outside this track’s scope.
Community discussion
Section titled “Community discussion”None selected for this lesson. K-means is well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.