Skip to content

Cheatsheet: Squeezing dimensions: PCA

TermMeaning
PC1direction of most variation in the data
PC2perpendicular to PC1; next-most variation
PC3, …each perpendicular to all prior, next-most variation
OrderPCs sorted by variance explained, largest first
ItemDetail
Formweighted combination of the original features
Weightsloadings (which original features matter for this PC)
Interpretabilitya thin slice via loadings; not as direct as an original feature

Variance explained (deciding how many PCs to keep)

Section titled “Variance explained (deciding how many PCs to keep)”
ToolHow to read it
Scree plotbar/line of variance % per PC, sorted descending
Elbowthe point where additional PCs add little
Rule of thumbkeep enough PCs for 90 to 95% cumulative variance
ExamplePC1PC2PC3PC4PC5PC6
variance60%25%8%4%2%1%
cumulative60%85%93%97%99%100%

Keep 3 PCs for >=90%; elbow between PC3 and PC4.

PurposeWhy
Visualizationplot PC1 vs PC2 to see structure in high-D data
Speedfewer features, faster downstream model
De-noisingdrop low-variance PCs (often noise)
Preprocessingfeed reduced features into clustering or classification
IssueWhat to do
Linear onlyuse a nonlinear method (e.g., t-SNE) for curved/cluster structure
Scale-sensitivestandardize features (mean 0, variance 1) before PCA
High variance assumed = signalusually true; not always
PCs are mixturesless interpretable than originals; read with loadings