Skip to content

Summary: Squeezing dimensions: PCA

PCA finds new axes (principal components) ordered by how much the data varies along each one, so you can keep the first few and throw away the rest, compressing many features into a handful while preserving most of the signal. It is the standard tool for reducing dimensions, visualizing high-dimensional data, and preprocessing before another model. This summary is the scan version of the full lesson.

  • The question PCA answers. Out of every possible direction through the data, which one does the data vary along the most? That direction is PC1.
  • PCs are ordered by variance. PC1 captures the most variation; PC2 (perpendicular to PC1) the next-most; PC3 (perpendicular to both) the next; and so on. Each PC is a weighted combination of the original features, and its weights (loadings) tell you which originals matter most for it.
  • Keep the first few, drop the rest. Often 2 or 3 PCs cover 90 to 95 percent of the variation, so you can shrink a hundred features into three with little loss. A scree plot of variance explained per PC, looking for an elbow, guides the choice.
  • Why “most variation” means “most information.” Near-constant directions carry no distinguishing information; the high-variance ones hold most of what makes the points different.
  • PCA is linear and scale-sensitive. It finds straight axes (cannot capture curved structure), and it chases variance so unstandardized features hijack the components. Always standardize first.
  • Used for: visualization (plot PC1 vs PC2), speed (fewer features), de-noising (drop low-variance PCs), and preprocessing.

When you see a 2D plot of a high-dimensional dataset, embeddings from a language model squashed down, customer segments scattered across two axes, gene-expression samples spread on a page, you now know how the picture was made. Someone ran PCA, kept the first two components, and plotted those. That habit of “I have too many features; reduce them first” is one of the most useful instincts to carry forward. The honest limit is also worth remembering: PCA’s straight axes can flatten exactly the curved or cluster-shaped structure you want to see. When that is the goal, you reach for the nonlinear method built specifically for visualizing clusters, which is the next lesson, t-SNE.