Skip to content

Seeing high-dimensional data: t-SNE

This is lesson 12 of Track 10, the close of Phase 3 (Finding structure without labels). By the end you will be able to read a t-SNE plot without over-reading it, trusting what it does say (which points cluster together) and resisting the urge to read meaning into what it does not (the distance between clusters, the size of a cluster, the exact arrangement on the page). The one capability to walk away with: looking at any t-SNE figure in modern AI, you can name which interpretations are valid and which are common misreadings.

The track structurally mirrors StatQuest’s intuition-first machine learning videos. Full attribution is in this lesson’s references.

This closes the unsupervised phase. PCA, the previous lesson, gave you a linear way to compress data, great at preserving variance but limited to straight axes. t-SNE is the nonlinear partner built specifically for visualization, picking up where PCA’s straight axes hide clusters that sit on curved surfaces. Together the two lessons cover the dimensionality-reduction half of unsupervised learning. From here the track turns to its last and most cross-cutting question: how do you know whether any model is actually any good? Phase 4 opens with the bias-variance tradeoff.

Prerequisite: Lesson 11, Squeezing dimensions: PCA. You need the idea of dimensionality reduction and why visualizing high-dimensional data is hard, because this lesson is the nonlinear visualization counterpart to PCA. No math beyond comparing similarities; the t-SNE algorithm itself is presented at an intuition level.

  • Explain what t-SNE is for (visualization) and not for (preprocessing)
  • Describe at intuition level how it works (matching 2D similarities to high-D ones)
  • Identify what t-SNE preserves (local) and does not (global)
  • Avoid the three common misreadings of t-SNE plots
  • Explain perplexity and the practice of running t-SNE multiple times
  • Read time: about 12 minutes
  • Practice time: about 15 minutes (a valid-vs-misreading exercise, a perplexity-tuning question, and flashcards)
  • Difficulty: standard