Cheatsheet: Squeezing dimensions: PCA
What PCA finds
Section titled “What PCA finds”| Term | Meaning |
|---|---|
| PC1 | direction of most variation in the data |
| PC2 | perpendicular to PC1; next-most variation |
| PC3, … | each perpendicular to all prior, next-most variation |
| Order | PCs sorted by variance explained, largest first |
What a principal component IS
Section titled “What a principal component IS”| Item | Detail |
|---|---|
| Form | weighted combination of the original features |
| Weights | loadings (which original features matter for this PC) |
| Interpretability | a thin slice via loadings; not as direct as an original feature |
Variance explained (deciding how many PCs to keep)
Section titled “Variance explained (deciding how many PCs to keep)”| Tool | How to read it |
|---|---|
| Scree plot | bar/line of variance % per PC, sorted descending |
| Elbow | the point where additional PCs add little |
| Rule of thumb | keep enough PCs for 90 to 95% cumulative variance |
| Example | PC1 | PC2 | PC3 | PC4 | PC5 | PC6 |
|---|---|---|---|---|---|---|
| variance | 60% | 25% | 8% | 4% | 2% | 1% |
| cumulative | 60% | 85% | 93% | 97% | 99% | 100% |
Keep 3 PCs for >=90%; elbow between PC3 and PC4.
When to use PCA
Section titled “When to use PCA”| Purpose | Why |
|---|---|
| Visualization | plot PC1 vs PC2 to see structure in high-D data |
| Speed | fewer features, faster downstream model |
| De-noising | drop low-variance PCs (often noise) |
| Preprocessing | feed reduced features into clustering or classification |
Limitations and the scaling gotcha
Section titled “Limitations and the scaling gotcha”| Issue | What to do |
|---|---|
| Linear only | use a nonlinear method (e.g., t-SNE) for curved/cluster structure |
| Scale-sensitive | standardize features (mean 0, variance 1) before PCA |
| High variance assumed = signal | usually true; not always |
| PCs are mixtures | less interpretable than originals; read with loadings |