Skip to content

Lesson: Linear transformations as moves

Last lesson ended on a clean idea: every vector in the plane is a unique combination of two basis vectors, i-hat and j-hat. The vector 3, 4 is just “3 of i-hat plus 4 of j-hat.” Hold onto that, because this lesson does one thing with it: it moves the basis vectors, and watches everything else move along with them.

That is what a matrix is, underneath the grid of numbers. Not a table, not a spreadsheet. A record of where the basis vectors went. Once you see it that way, a matrix stops being something you compute with by rote and becomes something you can picture.

A transformation is just a function that takes a vector in and gives a vector out. The word “transformation” instead of “function” is a hint to picture it as motion: every point in the plane picks up and moves to a new spot, all at once.

Most such motions are chaotic. Linear algebra cares about a specific, well-behaved kind called a linear transformation, defined by two requirements:

  1. The origin stays fixed. The zero vector maps to the zero vector. The center of the plane does not move.
  2. Straight lines stay straight. No line gets bent into a curve. A grid of evenly spaced parallel lines stays a grid of evenly spaced parallel lines, just possibly rotated, stretched, or sheared.
Linear versus non-linear transformations: the grid stays a tiling of parallelograms in one case and curves in the other Two side-by-side panels. The left panel shows a linear transformation applied to a square grid: the grid lines stay parallel and evenly spaced, tilted and stretched into a tiling of parallelograms. The right panel shows a non-linear transformation applied to the same grid: the grid lines curve, breaking the parallel and evenly-spaced rules. The labels above each panel mark "LINEAR" and "NON-LINEAR" and summarize the two linearity requirements. LINEAR grid stays parallel + evenly spaced NON-LINEAR grid lines curve
A linear transformation tilts and stretches the grid but the squares stay parallelograms: lines that were parallel before are still parallel after, and equally spaced steps stay equally spaced. The moment the grid curves, the transformation is non-linear.

That second requirement is the one to hold in your eye. If you imagine the plane as graph paper, a linear transformation can stretch it, rotate it, flip it, or shear it into a slanted grid, but it can never curve the lines or bunch them unevenly. Parallel and evenly spaced, before and after.

Here is the move that makes all of this tractable, and it is genuinely surprising the first time. To know what a linear transformation does to every vector in the plane, you only need to know what it does to two of them: i-hat and j-hat.

The reason comes straight from last lesson. Any vector is a linear combination of the basis vectors:

v = x · i-hat + y · j-hat

A linear transformation preserves that combination. Because lines stay evenly spaced and the origin stays fixed, scaling and adding survive the transformation intact: if you transform the vector, you get the exact same combination of the transformed basis vectors.

L(v) = x · L(i-hat) + y · L(j-hat)

Read that slowly. The output uses the same two scalars, the x-coordinate and the y-coordinate, the same ones that were the coordinates of the vector. All that changed is which two vectors they are scaling: not the original i-hat and j-hat, but wherever the transformation sent them. Track those two landing spots and you can reconstruct the destination of any vector at all.

A linear transformation is determined by what it does to the basis Two side-by-side panels showing the same vector v equals 1 times i-hat plus 2 times j-hat before and after a linear transformation L. The left panel shows the source frame with a teal i-hat, an amber j-hat, and a purple v at [1, 2]. The right panel shows the transformed frame, with L of i-hat landing at [2, negative 1], L of j-hat landing at [1, 1], and L of v landing at [4, 1], computed as 1 times L of i-hat plus 2 times L of j-hat. The transformed grid lines sit behind in faint color, showing where the source grid lands. BEFORE î ĵ v = î + 2ĵ AFTER L L(î) = [2, -1] L(ĵ) = [1, 1] L(v) = [4, 1] L(v) = 1·L(î) + 2·L(ĵ)
The same arithmetic that built v from î and ĵ rebuilds L(v) from L(î) and L(ĵ). Track where the two basis vectors land, and the whole transformation follows. That is why a 2x2 matrix, just two columns naming L(î) and L(ĵ), is enough to encode any 2D linear transformation.

A matrix is the record of where the basis landed

Section titled “A matrix is the record of where the basis landed”

So a linear transformation is fully captured by two vectors: where i-hat goes and where j-hat goes. A 2x2 matrix is just the compact way to write those two vectors down, side by side, as columns.

[ a b ]
[ c d ]

The first column, the vector a, c, is where i-hat lands. The second column, the vector b, d, is where j-hat lands. That is the entire content of the matrix: column one is L of i-hat, column two is L of j-hat. Everything else about matrices, all the rules that look arbitrary when you first meet them, falls out of this one fact.

This also tells you exactly what it means to multiply a matrix by a vector. The product M times the vector, for a vector with components, is defined to be

x · (first column) + y · (second column)

which is precisely the x-coordinate times L of i-hat plus the y-coordinate times L of j-hat, the destination of the vector. Matrix-vector multiplication is not a strange new rule to memorize. It is the linear-combination idea from last lesson, applied to the transformed basis vectors. The columns are the transformed basis vectors, and the product just reassembles your vector out of them.

Take the vector 3, 4 from the opening and run it through two transformations.

A horizontal stretch. Send i-hat to the vector 2, 0 (twice as long, same direction) and leave j-hat at 0, 1. The matrix is

[ 2 0 ]
[ 0 1 ]

Apply it to the vector 3, 4:

3 · [2, 0] + 4 · [0, 1] = [6, 0] + [0, 4] = [6, 4]

The x-coordinate doubled, the y-coordinate held still. The whole plane got stretched sideways by a factor of two, and the vector 3, 4 went along for the ride to 6, 4.

A ninety-degree rotation. Rotate everything a quarter turn counterclockwise. Now i-hat, which pointed right, points up: it lands at 0, 1. And j-hat, which pointed up, now points left: it lands at negative-1, 0. The matrix is

[ 0 -1 ]
[ 1 0 ]

Apply it to the vector 3, 4:

3 · [0, 1] + 4 · [-1, 0] = [0, 3] + [-4, 0] = [-4, 3]

The point 3, 4 swung a quarter turn around the origin to negative-4, 3, exactly where a quarter turn should put it.

A shear. Leave i-hat where it is at 1, 0, but send j-hat rightward to 1, 1, so the vertical direction tips over like a deck of cards pushed sideways. The matrix is

[ 1 1 ]
[ 0 1 ]

Apply it to the vector 3, 4:

3 · [1, 0] + 4 · [1, 1] = [3, 0] + [4, 4] = [7, 4]

Points high above the x-axis slide far to the right, points near the x-axis barely move, and the unit square slants into a leaning parallelogram. That uneven sliding, more shift the higher you go, is what a shear looks like.

In each case you never had to think about the vector 3, 4 itself during the setup. You only decided where the basis vectors should go, wrote them as columns, and the arithmetic carried the rest. Stretch, rotate, shear: same recipe, different columns.

Sketching what a matrix does to the unit square

Section titled “Sketching what a matrix does to the unit square”

This is the capability to walk away with. Given any 2x2 matrix, you can sketch what it does to space without computing anything for individual points.

Start with the unit square, the square with corners at the origin, i-hat, j-hat, and i-hat plus j-hat. To see what a matrix does, plot where i-hat and j-hat land (the two columns), and draw the parallelogram they now span. That transformed parallelogram is the image of the unit square, and it tells you the whole story: how the space got stretched, rotated, sheared, or flipped.

The unit square becomes a parallelogram, whose two sides are the columns of the matrix Two side-by-side panels. The left panel shows the unit square at the origin with corners at (0, 0), (1, 0), (1, 1), and (0, 1), filled in light accent purple, with a teal i-hat along the bottom and an amber j-hat along the left side. The right panel shows the parallelogram after applying the matrix L = [[2, 1], [0, 2]]. The parallelogram has corners at (0, 0), (2, 0), (3, 2), (1, 2). The teal L of i-hat runs along its bottom edge to (2, 0) and the amber L of j-hat runs along its left edge to (1, 2). The columns of the matrix are exactly the two arrows. BEFORE: unit square î ĵ AFTER L: parallelogram L(î) = [2, 0] L(ĵ) = [1, 2] L = [2 1], [0 2] (columns)
The unit square at the origin becomes a parallelogram under any linear transformation. The two side vectors of that parallelogram are exactly the columns of the matrix: L(î) and L(ĵ). The matrix and the parallelogram are the same picture in two notations.

Work one all the way through. For the shear matrix above, i-hat stays at 1, 0 and j-hat moves to 1, 1. Plot those two arrows, and the unit square that sat upright becomes a parallelogram leaning to the right, its base still one unit wide but its top edge slid over. You did not test a single interior point; the two columns told you the shape.

For the horizontal stretch, the unit square becomes a 2-by-1 rectangle. For the rotation, it stays a unit square but spun a quarter turn. For a matrix whose columns point in nearly the same direction, the square squashes almost flat, a clue that the transformation is collapsing the plane toward a line. You are reading geometry directly off the columns.

Matrices are the main verb of a neural network. A linear layer, the most common building block, holds a matrix of learned weights, and running data through that layer is exactly the matrix-times-vector operation from this lesson: take the incoming vector, apply a linear transformation, get a new vector. The network’s parameters are, in large part, the entries of these matrices, which is to say they are records of how each layer moves its input space.

A real layer usually adds a constant shift and then a nonlinear bend afterward, so the full layer is not purely linear. But the heavy lifting, the part with almost all the parameters, is the matrix multiply you just learned. When people say a model has billions of parameters, most of those parameters are sitting in matrices, each one a linear transformation waiting to be applied. Understanding what a single matrix does to space is understanding the atom of what every layer does to data.

Reading a matrix row by row. The meaning lives in the columns, not the rows. The first column is where i-hat lands, the second is where j-hat lands. When a matrix confuses you, read it as two destination vectors standing side by side.

Forgetting the origin is pinned. A linear transformation cannot move the origin or slide the whole plane sideways. If a motion shifts everything over by a fixed amount, that is a different kind of operation (an affine one), not a linear transformation, and it cannot be captured by a 2x2 matrix alone.

Thinking the grid can curve. Stretch, rotate, shear, flip: yes. Curve or bunch unevenly: no. If the grid lines bend, the transformation is not linear, and none of this lesson’s machinery applies to it.

Treating matrix-vector multiplication as a memorized rule. It is not arbitrary. The product M times the vector is just the x-coordinate times column one plus the y-coordinate times column two, the linear combination from last lesson, with the transformed basis vectors as the ingredients.

  • A linear transformation moves the whole plane while keeping the origin fixed and grid lines straight, parallel, and evenly spaced. Stretch, rotate, shear, flip are allowed; curving and shifting are not.
  • A transformation is fully determined by where it sends i-hat and j-hat, because every vector is a combination of those two, and the transformation preserves the combination: L of the vector equals the x-coordinate times L of i-hat plus the y-coordinate times L of j-hat.
  • A 2x2 matrix is just those two landing spots written as columns, and the product M times the vector reassembles your vector out of them. To sketch what a matrix does, plot its two columns and draw the parallelogram they span.

A matrix is not a grid of numbers. It is a record of where the basis went, and that record is enough to move all of space. The next lesson asks the obvious follow-up: if one matrix is one move, what does it mean to do two moves in a row?