Summary: Matrices between dimensions

Every matrix so far was square: input and output the same size. Drop that assumption and a matrix can move between dimensions. The whole lesson is a pattern-extends payoff: a rectangular matrix carries one dimension count to another (embedding a small space into a bigger one, or projecting a big space into a smaller one), and the same columns, rank, and null space describe exactly what it does. Nothing new to learn, a new shape to read. This is the scan-it-in-five-minutes version.

Core ideas

An m × n matrix maps n-dimensional input to m-dimensional output. The n columns are the input dimension (one per input basis vector); each column holds m numbers because it is a landing spot in the output. Count the columns first: that is the input dimension, and the shape tells you the direction of the mapping.
More rows than columns is an embedding (e.g. a 3x2 matrix takes 2D in, 3D out): it lays the input space down intact as a tilted plane inside the bigger space, losing nothing, so the null space is just the origin. Worked anchor: [[1,0],[0,1],[1,1]] sends [3,4] to [3,4,7], rank 2, null space {0}.
More columns than rows is a projection (e.g. a 2x3 matrix takes 3D in, 2D out): it squashes the input down. Worked anchor: [[1,0,0],[0,1,0]] sends [3,4,5] to [3,4], dropping height; its null space is the whole z-axis. A projection always crushes something.
The spaces keep their sides: the column space (reachable outputs, span of the columns) lives in the output; the null space (inputs crushed to zero) lives in the input. Rank is the dimension of the column space, capped by the smaller of m and n.
Conservation still balances: rank + nullity = number of columns. The striking case is a full-rank projection: it can use all of its small output and still have a null space, because the output is too small to keep every input distinct. (Full rank forces a trivial null space only when the matrix is square.) Dependent columns make a matrix rank-deficient, a collapse onto something smaller than even the output allows.
To classify any rectangular matrix: input dim = columns, output dim = rows, rank = dimension of the column space, and the meaning follows (full-rank tall = embedding; wide = projection with a null space; dependent columns = collapse).
This is the shape of nearly every neural network layer. A 256x768 matrix compresses a 768-dimensional embedding to 256 (a projection, with a null space of discarded directions); a 768x256 matrix expands 256 to 768 (an embedding). Dimension reduction, the workhorse of autoencoders, attention projections, and model compression, is rectangular matrices earning their keep.

What changes for you

Before this lesson, “the layer is 256 by 768” was probably just two numbers. Now it is a direction and a meaning: 768 in, 256 out, a projection that compresses and necessarily discards some directions (its null space). When you next read a model’s layer shapes, you can tell at a glance which layers expand a representation and which compress it, and you know that the compressing ones are throwing specific information away by design. The next lesson takes the most extreme rectangular case of all, a matrix with a single row that turns a vector into one number, and reveals it as the dot product in disguise.