Skip to content

References: Seeing high-dimensional data: t-SNE

Source material (conceptual spine):
• StatQuest with Josh Starmer: "t-SNE, clearly explained"
Creator: Josh Starmer
YouTube: https://www.youtube.com/watch?v=NEaUSP4YerM
Channel / site: https://statquest.org/
License: as published on StatQuest's public YouTube channel (link-out only)
Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos remain with the creator.
  • StatQuest’s “t-SNE, clearly explained” anchors the high-D-to-2D similarity-matching idea, the role of perplexity, and the central honesty that t-SNE preserves local but not global structure. The three explicit misreadings (between-cluster distance, cluster size, single-run stability) and the practical guidance to vary perplexity and seeds are built out here as the lesson’s central capability.

The framing of t-SNE as visualization-only (not preprocessing), the contrast table with PCA, and the explicit closing into Phase 4’s evaluation question are Clawdemy’s own connective tissue across the track.

  • StatQuest with Josh Starmer. The t-SNE explainer plus StatQuest’s dimensionality reduction and clustering material.
  • “How to Use t-SNE Effectively” at Distill.pub. A widely-shared interactive article that demonstrates exactly the misreadings flagged in this lesson, with live examples you can play with. Strongly recommended after this lesson.
  • UMAP (Uniform Manifold Approximation and Projection). A modern nonlinear dimensionality reduction method, often faster than t-SNE and tending to preserve more global structure. A common alternative; worth trying alongside t-SNE.
  • PCA + t-SNE pipeline. A common workflow: PCA first to reduce noise and speed, then t-SNE on the reduced data for the final 2D picture.
  • Bias and variance (the next lesson). Phase 4 opens by formalizing the central modeling tension hovering over every choice we have made so far.

The Distill article above is the strongest public discussion of how to read t-SNE plots; little additional material adds durable value beyond it. If a canonical discussion surfaces, it will be added at the next review.