Grouping without labels: k-means clustering
What you’ll learn
Section titled “What you’ll learn”This is lesson 9 of Track 10, the opener of Phase 3 (Finding structure without labels). By the end you will be able to walk through the k-means clustering loop by hand, from a starting guess to convergence, and judge when clustering is the right tool and when it will mislead you. The one capability to walk away with: run the assign-and-update loop yourself, and recognize that k-means always returns the number of clusters you ask for, real or not.
The track structurally mirrors StatQuest’s intuition-first machine learning videos, with Microsoft’s “ML For Beginners” as the hands-on companion for readers who want to build the models in code. Full attribution is in this lesson’s references.
Where this fits
Section titled “Where this fits”This lesson opens the unsupervised phase. Phases 1 and 2 were entirely supervised: every model learned from labeled answers. Here the labels are gone, and the goal shifts from predicting a known answer to discovering structure in raw data. K-means is the natural starting point, the simplest and most widely used clustering method. The next lesson, hierarchical clustering, tackles the same job without making you choose the number of clusters in advance.
Before you start
Section titled “Before you start”Prerequisite: Lesson 1, What machine learning actually is. You need the distinction between supervised and unsupervised learning, because this lesson is the first to work with unlabeled data, exactly the unsupervised case lesson 1 described. No math beyond computing averages and comparing distances.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Explain how clustering differs from supervised learning
- Walk through the k-means assign-and-update loop to convergence
- Use the elbow method to choose k
- Explain why k-means always returns k clusters and why that demands judgment
- Name the main limitations of k-means
Time and difficulty
Section titled “Time and difficulty”- Read time: about 12 minutes
- Practice time: about 15 minutes (a by-hand iteration exercise, a judgment question, and flashcards)
- Difficulty: standard