Skip to content

Lesson: The stubborn vectors, eigenvectors and eigenvalues

When a linear transformation moves the plane, watch what happens to a single vector. Most vectors get knocked off their own line: the arrow ends up pointing in a different direction than it started. But for a given transformation, a few special vectors are stubborn. They stay on the same line they started on, getting stretched longer, squished shorter, or flipped to point backward, but never rotated off their line.

Those stubborn vectors are called eigenvectors, and the factor by which each one gets scaled is its eigenvalue. They are the deepest idea in this track, because they reveal what a transformation is actually doing underneath the obfuscation of whatever basis you happened to write it in. This lesson finds them, both by eye and by formula, and closes the promise from the change-of-basis lesson: eigenvectors are the basis in which a transformation looks as simple as it possibly can.

An eigenvector of a matrix M is a nonzero vector that M merely scales, without knocking it off its line:

M · v = λ · v

The scalar lambda is the eigenvalue: the factor by which the vector gets scaled. The equation says something strong. On the left, M does its full transformation to the vector. On the right, all that happened was multiplication by a single number. For an eigenvector, the entire matrix acts like a plain scalar. No rotation, no shear, just scaling.

The geometric way to hunt for them: imagine sweeping an input vector all the way around the unit circle and watching where the transformation sends it. For most input directions, the output points somewhere off the input’s line. At a few special directions, the output snaps back onto the same line. Those directions are the eigenvectors, and how much the transformation stretched or flipped them is the eigenvalue.

Sweep an input vector around the unit circle: at eigenvector directions the output is collinear with the input A two-dimensional coordinate grid with a faint dashed unit circle around the origin. Eight purple input arrows of length 1 are drawn at 45 degree intervals around the circle. For each input, a red output arrow shows M times that input. For most input angles, the red output is in a different direction than the purple input. For two specific input angles, along [1, 1] and along [1, negative 1], the red output lies exactly on the same line through the origin as the input, indicating eigenvector directions. eigen line: [1, 1], λ = 3 eigen line: [1, -1], λ = 1 M = [[2, 1], [1, 2]] unit input v M·v output eigenvector = direction where M·v lies on the same line as v M·v = λ·v at eigenvectors
Sweep an input vector around the unit circle; for most directions, M sends the output off that direction's line. At exactly two directions (along [1, 1] and [1, -1] here), the output stays on the same line as the input. Those are the eigenvectors; the stretching factors 3 and 1 are the eigenvalues.

Some transformations let you spot the eigenvectors without any calculation.

A pure stretch. Take M with first column 2, 0 and second column 0, 3, which stretches the x-direction by 2 and the y-direction by 3. Look at i-hat, the vector 1, 0: the matrix sends it to 2, 0, which is 2 times i-hat, still on the x-axis. So i-hat is an eigenvector with eigenvalue 2. Likewise j-hat, the vector 0, 1, goes to 0, 3, which is 3 times j-hat, an eigenvector with eigenvalue 3. Here the standard basis is already the eigenvector basis, and, not coincidentally, the matrix is already diagonal: the eigenvalues sit right on the diagonal.

A rotation. Take the 90-degree rotation R with first column 0, 1 and second column negative-1, 0. Every vector gets spun a quarter turn, so no vector stays on its own line. A pure 2D rotation has no real eigenvectors at all. (It has eigenvalues in the complex numbers, which is how rotations are handled in more advanced settings, but that is beyond this lesson; in the real plane, there are simply none.)

A shear. Take the shear S with first column 1, 0 and second column 1, 1. The vector i-hat stays put at 1, 0, an eigenvector with eigenvalue 1. But j-hat gets sheared to 1, 1, off its line, so it is not an eigenvector. The shear has only one eigenvector direction, the x-axis. This is a degenerate case worth seeing: a transformation need not have a full set of independent eigenvectors.

Three transformations contrasted by eigenvector structure: stretch, rotation, shear Three side-by-side panels showing how a stretch, a rotation, and a shear treat the basis vectors. In the left "stretch" panel, both i-hat and j-hat get scaled along their own axes and remain on those axes, so both are eigenvectors. In the middle "rotation 30 degrees" panel, both basis vectors rotate off their original directions, indicated by faded original arrows; no vector stays on its line, so there are no real eigenvectors. In the right "shear" panel, i-hat is unchanged (on the x-axis still), making it an eigenvector with eigenvalue 1, while j-hat tilts to (1, 1) off the y-axis. stretch î and ĵ both stay; 2 eigenvectors rotation 30° every vector rotates off; no real eigenvectors shear only î stays on x-axis; 1 eigenvector
Three transformations, three eigen-stories. A stretch keeps both basis vectors on their own axes, so it has two eigenvectors. A pure rotation rotates everything off its line, so it has none (over the reals). A shear keeps only one axis fixed, so it has one. The geometry tells you the count.

By eye works for tidy matrices; for the rest, there is a method. Start from the definition and rearrange:

M · v = λ · v ⟹ M · v - λ · v = 0 ⟹ (M - λI) · v = 0

where I is the identity matrix, so that lambda I is the matrix that scales everything by lambda. We want a nonzero vector that (M minus lambda I) sends to the zero vector. From the inverses lesson, a matrix sends a nonzero vector to zero exactly when it collapses space, which happens exactly when its determinant is zero. So the eigenvalues are the values of lambda that make the determinant zero:

det(M - λI) = 0

This is the characteristic equation. Solve it for lambda, then for each lambda find the vectors that (M minus lambda I) crushes to zero, the null space from the inverses lesson, and those are the eigenvectors.

Take M with first row 3, 1 and second row 0, 2 and find everything.

First the eigenvalues. Subtract lambda down the diagonal:

M - λI = [ 3-λ 1 ]
[ 0 2-λ ]

Its determinant factors on sight, and setting it to zero gives the two eigenvalues:

det(M - λI) = (3-λ)(2-λ) - (1)(0) = (3-λ)(2-λ) = 0 -> λ = 3, λ = 2

Now the eigenvector for each. For lambda equals 3, form M minus 3I, the matrix with first row 0, 1 and second row 0, negative-1, and solve for the vectors it sends to zero. Both rows say the y-coordinate is zero, with the x-coordinate free, so the eigenvectors are everything along the vector 1, 0. For lambda equals 2, form M minus 2I, the matrix with first row 1, 1 and second row 0, 0, and solve: the top row says the x-coordinate plus the y-coordinate equals zero, so the y-coordinate is the negative of the x-coordinate, giving the line through 1, negative-1.

Check both against the definition:

M · [1, 0] = [3, 0] = 3 · [1, 0] eigenvalue 3, confirmed
M · [1, -1] = [2, -2] = 2 · [1, -1] eigenvalue 2, confirmed

The second check spelled out: M applied to the vector 1, negative-1 gives “3 times 1 plus 1 times negative-1, then 0 times 1 plus 2 times negative-1,” which is the vector 2, negative-2, indeed 2 times 1, negative-1. Two eigenvalues, two eigenvector directions, all verified.

That matrix was upper-triangular, which made the determinant factor on sight. Here is one where it does not. Take M with first row 2, 1 and second row 1, 2. Then M minus lambda I has first row 2 minus lambda, 1 and second row 1, 2 minus lambda, and its determinant works out as below:

det(M - λI) = (2-λ)(2-λ) - (1)(1) = (2-λ)^2 - 1 = 0
(2-λ)^2 = 1 -> 2-λ = 1 or 2-λ = -1 -> λ = 1 or λ = 3

For lambda equals 3, M minus 3I has first row negative-1, 1 and second row 1, negative-1, and the equation “negative the x-coordinate plus the y-coordinate equals zero” gives the y-coordinate equal to the x-coordinate, the line through 1, 1. For lambda equals 1, M minus I has first row 1, 1 and second row 1, 1, and “the x-coordinate plus the y-coordinate equals zero” gives the line through 1, negative-1. Check:

M · [1, 1] = [3, 3] = 3 · [1, 1]
M · [1, -1] = [1, -1] = 1 · [1, -1]

The characteristic equation handled a matrix whose eigenvectors were nowhere obvious by eye.

The two eigenvector lines for M = [[3, 1], [0, 2]]: x-axis with eigenvalue 3 and line through (1, -1) with eigenvalue 2 A two-dimensional coordinate grid. A dashed teal line runs along the x-axis: this is the eigenvector line for eigenvalue 3. A dashed amber line runs through the origin at 45 degrees through (1, -1): this is the eigenvector line for eigenvalue 2. On each line, an input vector of length 1 (faint) and the matching output vector of length lambda (bold) are drawn: along x, input [1, 0] maps to output [3, 0]; along the diagonal, input [1, -1] maps to output [2, -2]. Legend explains M times v equals lambda times v at eigenvectors. output [3, 0] input [1, 0] output [2, -2] input [1, -1] eigen line, λ = 3 eigen line, λ = 2 at eigenvectors, M acts like a scalar: M · v = λ · v
For M = [[3, 1], [0, 2]] the eigenvector lines are the x-axis (stretching by 3) and the line through (1, -1) (stretching by 2). On these two lines and only these two, M acts like a plain scalar: the output is the input rescaled, never rotated off its line.

This is what the change-of-basis lesson promised. When a 2x2 matrix has two independent eigenvectors, they form a basis, and in that basis the transformation becomes pure scaling along the axes, the simplest possible matrix: a diagonal one.

Build P, the matrix whose columns are the eigenvectors, and change basis with the sandwich from last lesson:

P = [ 1 1 ] D = P^-1 · M · P
[ 0 -1 ]

Run it through for our M with first row 3, 1 and second row 0, 2. The inverse of P (using the 2x2 shortcut from last lesson, with the determinant of P equal to negative 1) happens to be P itself: P-inverse has first row 1, 1 and second row 0, negative-1. Compute M times P first, by applying M to each column of P, then apply P-inverse to each result:

M · P: M · [1, 0] = [3, 0], M · [1, -1] = [2, -2] -> M · P = columns [3, 0], [2, -2]
P^-1 · (M · P): gives columns [3, 0] and [0, 2]

So the change of basis lands on a diagonal matrix with the eigenvalues on the diagonal:

D = P^-1 · M · P = [ 3 0 ]
[ 0 2 ]

In the eigenvector basis, the messy-looking M with first row 3, 1 and second row 0, 2 is revealed as nothing more than “stretch by 3 along one eigenvector, by 2 along the other.” The shear-like off-diagonal entry was an artifact of describing the transformation in the standard basis. Change to the basis the transformation itself prefers, and it turns transparent.

That is the deep reason eigenvectors matter: they are the natural coordinate system of a transformation. The eigenvalues are what it actually does, stripped of the arbitrary basis you first wrote it in.

Eigenvalue and eigenvector analysis is genuinely central across machine learning, more so than the cross product was.

Principal Component Analysis is the headline example. PCA finds the eigenvectors of a dataset’s covariance matrix; those eigenvectors point along the directions of greatest variation in the data, the principal components, and the eigenvalues say how much variation each captures. Keeping the few eigenvectors with the largest eigenvalues compresses the data while preserving most of its spread. The change-of-basis and eigenvector ideas from these two lessons are exactly the machinery PCA runs on.

Exploding and vanishing gradients, the central difficulty in training deep and recurrent networks, is an eigenvalue story. When a signal passes repeatedly through a weight matrix, it gets scaled by the matrix’s eigenvalues each time. Eigenvalues with magnitude above 1 make the signal blow up over many steps; eigenvalues below 1 make it decay to nothing. The whole stability of training depends on keeping those eigenvalue magnitudes near 1.

Spectral methods in graph neural networks use the eigenvectors of a graph’s connectivity matrix to encode its structure, and classic algorithms like PageRank are, underneath, the search for a particular eigenvector. Once you know to look, eigenvalue problems are everywhere.

Forgetting eigenvectors are whole lines, not single arrows. If a vector is an eigenvector, so is any nonzero scalar multiple of it: scaling it does not change the line it lies on. An eigenvector names a direction (a line through the origin), and people usually pick one representative vector on it.

Expecting every matrix to have real eigenvectors. Rotations in the real plane have none. Shears have only one direction instead of two. A full set of independent eigenvectors is common but not guaranteed.

Misreading a zero eigenvalue. An eigenvalue of zero is allowed and meaningful: it means the transformation crushes that eigenvector’s direction to the origin, the collapse from the determinant and inverses lessons. Zero eigenvalue, not invertible.

Mixing up eigenvalue and eigenvector. The eigenvalue is the number (the scaling factor); the eigenvector is the direction that gets scaled. In the equation “M times the vector equals lambda times the vector,” the vector is what gets scaled and lambda is the number.

  • An eigenvector is a nonzero vector the transformation only scales, not rotates: M times the vector equals lambda times the vector. The eigenvalue lambda is the scaling factor. Geometrically, eigenvectors are the directions that stay on their own line through the transformation; the eigenvalue is how much they stretch (or flip, if negative, or collapse, if zero).
  • Find eigenvalues by solving the characteristic equation, the determinant of (M minus lambda I) equal to zero, then find each eigenvector as the null space of M minus lambda I. This works because a nonzero vector maps to zero only when the matrix collapses, which is the zero-determinant condition from the inverses lesson.
  • In the eigenvector basis, the transformation is diagonal: D equals P-inverse times M times P has the eigenvalues on its diagonal, where the columns of P are the eigenvectors. This is the simplest a transformation can look, the basis it naturally prefers, and the reason eigenvectors are the workhorse behind PCA, gradient stability, and much more.

Most vectors are pushed around by a transformation; the eigenvectors hold their ground and only scale. Find them and you have found the transformation’s true axes, the directions along which it is nothing more than stretching. The final lesson steps back from arrows entirely and asks what survives when “vector” means something that is not an arrow at all.