Skip to content

References: Grouping without labels: k-means clustering

Source material (conceptual spine):
• StatQuest with Josh Starmer: "K-Means Clustering"
Creator: Josh Starmer
YouTube: https://www.youtube.com/watch?v=4b5d3muPQmA
Channel / site: https://statquest.org/
License: as published on StatQuest's public YouTube channel (link-out only)
Source material (hands-on companion):
• Microsoft: "ML For Beginners" (Clustering module)
Repository: https://github.com/microsoft/ML-For-Beginners
License: MIT
Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos and curriculum remain
with their creators.
  • StatQuest’s “K-Means Clustering” anchors the assign-update loop, the role of the centroid, and the elbow method for choosing k. The by-hand line trace and the explicit “always returns k clusters even on noise” caution are built out here as the lesson’s core honesty.
  • Microsoft’s ML-For-Beginners Clustering module is the hands-on companion: it runs k-means in Python with scikit-learn on real data, including the feature-scaling step this lesson flags.

The number-line worked example, the recovery-from-bad-init framing, and the connection to clustering language-model embeddings are Clawdemy’s own.

  • Hierarchical clustering (the next lesson). Clusters without committing to a number of groups, building a tree of nested clusters that shows structure at every scale.
  • k-means++. The standard smarter initialization that spreads the starting centroids out, reducing the chance of a poor local solution. A practical companion to the init-sensitivity noted here.
  • DBSCAN. A density-based clustering method that finds clusters of arbitrary shape and does not need k chosen in advance, addressing two of k-means’ limitations. Worth knowing exists, outside this track’s scope.

None selected for this lesson. K-means is well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.