Lesson: Seeing high-dimensional data: t-SNE
PCA gave us compression. It works wonderfully when the meaningful structure in the data lies along straight axes, and it falls down when it does not. If the points form curves, swirls, or clusters that sit on a non-flat surface, PCA’s straight directions of maximum variance can flatten exactly what you wanted to see, hiding clusters by collapsing them on top of each other.
t-SNE is built for the case PCA struggles with, and it is built for one job in particular: making a 2D picture in which similar high-dimensional points end up close together, so the natural groups in the data jump out at your eye. It is one of the most widely-shared images in modern AI: those colorful blobs of word embeddings, image features, or single-cell measurements you have seen scattered across a page were almost always made with t-SNE (or its sibling, UMAP). Reading them correctly is a real literacy skill, because the picture has a sharp warning label that almost everyone misses.
A visualization tool, not a preprocessor
Section titled “A visualization tool, not a preprocessor”The first thing to be clear about is what t-SNE is for. PCA was a general-purpose dimensionality reducer: you could plot it, you could feed its output into other models. t-SNE is only for visualization. Its goal is to produce a 2D layout where neighbors in high-dimensional space are still neighbors on the page, with similar points clustered together and dissimilar points pushed apart. It is not meant to be a preprocessing step that feeds another model. Use it to see the data; use PCA (or similar) to reduce it.
How it works, at the intuition level
Section titled “How it works, at the intuition level”t-SNE measures, for every pair of points in the high-dimensional data, how similar they are: close points get a high similarity, distant points a low one. Then it starts the same points in random positions in 2D, and iteratively shuffles those positions so that the pairwise similarities in 2D match the pairwise similarities in the high-dimensional space as well as possible. Similar points are pulled toward each other; dissimilar points are pushed apart. After many iterations, points that were neighbors in the original space have settled near each other on the page.
The result, on data with real structure, is usually striking: clusters that were buried in hundreds of dimensions show up as visibly separate blobs. The classic example is the MNIST handwritten digits, where each image is a vector of 784 pixel values. PCA on MNIST gives a smeared, partly-overlapping 2D plot. t-SNE on the same data produces ten clearly separated blobs, one per digit. That is its gift: structure that was invisible becomes obvious.
The sharp warning label
Section titled “The sharp warning label”Now the warning. t-SNE is faithful to local structure (who is near whom inside a cluster) and almost entirely careless about global structure (how clusters relate to each other). That carelessness shows up in three ways that, once you internalize them, change how you read every t-SNE plot you ever see.
The distance between clusters is meaningless. Two clusters drawn close together on a t-SNE plot are not necessarily more related than two drawn on opposite sides of the page. The arrangement of clusters across the picture is largely an artifact of the optimization, not a measurement of how the groups differ. Do not point at two adjacent blobs and conclude they are similar; t-SNE does not say that.
The size and density of a cluster is meaningless. A tightly packed blob may not be denser in the original space than a sprawling one; t-SNE often expands tight clusters and shrinks loose ones to fit them on the page. Do not read tightness as homogeneity.
Different runs give different layouts. Re-run t-SNE with a different random seed and you get a picture that may look quite different in arrangement, even with the same data. The clusters that appear in multiple runs are usually real; the way they are arranged on the page is not.
What t-SNE does tell you reliably is what points and groups are similar enough to cluster together. The picture is a clustering snapshot, not a map.
The perplexity dial
Section titled “The perplexity dial”t-SNE has one main tuning knob, called perplexity. It controls roughly how many neighbors each point pays attention to when computing similarities. Set perplexity too low and the picture fractures into many tiny clusters; set it too high and everything blurs into one blob. Common values lie between 5 and 50. The honest practice is to try several perplexities and trust the cluster structure that appears stably across them. If a “cluster” only shows at one perplexity setting, treat it with suspicion.
When NOT to reach for t-SNE
Section titled “When NOT to reach for t-SNE”A few cautions worth flagging now, before you meet t-SNE plots in the wild:
- Do not use it for preprocessing. Reducing data to 2D with t-SNE and feeding that into a classifier is a misuse; the 2D output is shaped for plots, not for modeling.
- It is slow on very large datasets. Hundreds of thousands of points can take a long time.
- There is a faster, often better-behaved alternative called UMAP that preserves more of the global structure and runs faster. If you have a choice, try both.
- Always run it more than once. A single t-SNE picture from a single seed is one of several possible layouts; what is real is what shows up across runs.
Why this matters when you use AI
Section titled “Why this matters when you use AI”You have almost certainly seen a t-SNE plot already without knowing it. Pictures of word embeddings clustering by topic, vision-model features grouping by category, gene-expression cells separating into types, almost all of them are made with t-SNE or UMAP. Reading them correctly, taking the clusters seriously but the inter-cluster geometry skeptically, is one of those small literacy skills that protects you from confidently wrong interpretations. The next time someone shows you a beautiful t-SNE figure and says “these two clusters are close, so they must be related,” you will know to ask whether that closeness shows up under different perplexities and seeds, or whether it is just where the optimization happened to land them this time.
Common pitfalls
Section titled “Common pitfalls”- Reading between-cluster distance as meaningful. It is not. The arrangement of clusters on the page is mostly arbitrary.
- Reading cluster size as meaningful. It is not. t-SNE expands and shrinks clusters to lay them out; tightness on the page is not density in the data.
- Trusting one run. A single seed at a single perplexity is one of many possible pictures. Vary both, and trust what is stable.
- Using t-SNE for preprocessing. It is a visualization method; feed PCA or another reducer into a downstream model, not t-SNE.
What you should remember
Section titled “What you should remember”- t-SNE is for visualization only, producing a 2D picture in which similar high-dimensional points end up near each other.
- It preserves local structure (who is near whom inside a cluster) and does not preserve global structure (the distances or arrangement between clusters).
- Cluster positions and sizes on a t-SNE plot are not meaningful; only the fact that points clustered together is.
- Vary perplexity and the random seed, and trust the structure that shows up stably across runs.
Phase 3 has now covered both halves of unsupervised learning: clustering, in two flavors, and dimensionality reduction, in a linear form (PCA) and a nonlinear one (t-SNE). Across the whole track we have built a substantial toolbox: regression, classification, ensembles, clustering, compression. But there is a question we have only gestured at throughout, the one that decides whether any of this is doing you any good: how do you actually know if a model is any good? The last phase of the track is devoted to it, beginning with the central tension behind every modeling decision: the bias-variance tradeoff.