Skip to content

References: Squeezing dimensions: PCA

Source material (conceptual spine):
• StatQuest with Josh Starmer: "Principal Component Analysis (PCA) Step-by-Step"
Creator: Josh Starmer
YouTube: https://www.youtube.com/watch?v=FgakZw6K1QQ
Channel / site: https://statquest.org/
License: as published on StatQuest's public YouTube channel (link-out only)
Source material (hands-on companion):
• Microsoft: "ML For Beginners" (Clustering / introductory modules; PCA appears
in supporting material)
Repository: https://github.com/microsoft/ML-For-Beginners
License: MIT
Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos and curriculum remain
with their creators.
  • StatQuest’s “PCA Step-by-Step” anchors the directions-of-maximum-variance framing, the role of variance explained, and the scree plot. The 2D oval intuition, the loadings interpretation, and the explicit standardize-first warning are built out here as practical hands-on guidance.
  • Microsoft’s ML-For-Beginners is the hands-on companion for fitting PCA in Python with scikit-learn alongside clustering and other unsupervised work.

The “PCA as compression” framing and the explicit set-up for nonlinear visualization (the bridge to t-SNE) are Clawdemy’s own connective tissue.

  • StatQuest with Josh Starmer. The PCA step-by-step video, plus the short 5-minute version, plus StatQuest’s covariance and eigenvector material for readers who want the math behind the directions of maximum variance.
  • Microsoft ML-For-Beginners. Project-based lessons where you fit PCA in scikit-learn, including the standardization step this lesson flags.
  • t-SNE (the next lesson). The nonlinear method built specifically to visualize clusters in 2D, picking up where PCA’s straight axes fall short.
  • UMAP. A modern alternative to t-SNE for nonlinear dimensionality reduction, often faster and better at preserving global structure. Worth knowing exists; outside this track’s scope.
  • Loadings and biplots. The standard way to read which original features drive each PC, and how the PCs relate to the original axes geometrically.

None selected for this lesson. PCA is well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.