Skip to content

Cheatsheet: Grouping without labels: k-means clustering

TermMeaning
Clusteringunsupervised: find natural groups in unlabeled data
kthe number of clusters (you choose it)
Centroida cluster’s center: the mean position of its points
StepAction
0choose k, place k centroids (often random)
1. Assigneach point joins its nearest centroid’s cluster
2. Updatemove each centroid to the mean of its assigned points
Repeatsteps 1-2 until assignments stop changing

Worked trace (points 1,2,3,10,11,12; k=2; init centroids 2,3)

Section titled “Worked trace (points 1,2,3,10,11,12; k=2; init centroids 2,3)”
IterAssignNew centroids
1A={1,2}, B={3,10,11,12}1.5, 9
2A={1,2,3}, B={10,11,12}2, 11
3no changeconverged
IdeaNote
Elbow methodrun several k, plot cluster tightness, pick the bend
Always returns k clusterseven on noise; finding k groups is NOT proof they are real
Your jobjudge whether the clusters are meaningful
LimitationNote
Must choose kstrongly shapes the result
Init-sensitiverun several times; k-means++ seeds smartly
Assumes round clustersstruggles with elongated or unequal groups
Distance-basedscale features first