Cheatsheet: Matrix multiplication as composition
The core idea
Section titled “The core idea”Multiplying matrices means composing transformations.
AB = "first apply B, then apply A"(AB) · v = A · (B · v)Read right to left: the matrix nearest the vector acts first, exactly like the inner function in f(g(x)).
How to compute AB
Section titled “How to compute AB”Each column of AB is A applied to the corresponding column of B.
| Step | Operation |
|---|---|
First column of AB | A · (first column of B) |
Second column of AB | A · (second column of B) |
The columns of B are where B sends the basis vectors; running them through A gives where the combined move sends them. (The row-times-column recipe gives the same numbers but hides this meaning.)
Two properties
Section titled “Two properties”| Property | Statement | Why |
|---|---|---|
| Not commutative | AB ≠ BA in general | Order is the sequence of physical moves; rotate-then-shear ≠ shear-then-rotate. |
| Associative | (AB)C = A(BC) | The chain of moves is one fixed sequence; grouping changes only which adjacent pair you bundle first, never the order. |
Worked example: rotation R and shear S
Section titled “Worked example: rotation R and shear S”R = [ 0 -1 ] (90 deg CCW) S = [ 1 1 ] (shear) [ 1 0 ] [ 0 1 ]Rotate then shear (SR), apply S to columns of R:
SR = [ 1 -1 ] SR · [3, 4] = [-1, 3] [ 1 0 ]Shear then rotate (RS), apply R to columns of S:
RS = [ 0 -1 ] RS · [3, 4] = [-4, 7] [ 1 1 ]SR ≠ RS and [-1, 3] ≠ [-4, 7]: non-commutativity, made concrete.
Associativity check
Section titled “Associativity check”With A = [[2,0],[0,1]], B = R, C = S:
(AB)C = [ 0 -2 ] A(BC) = [ 0 -2 ] [ 1 1 ] [ 1 1 ]Same matrix either grouping. This is why ABC needs no parentheses.
Why it matters for AI
Section titled “Why it matters for AI”Composing linear transformations gives one linear transformation. So stacking linear layers alone collapses to a single layer (no added power). The nonlinear step between layers breaks that collapse, which is why depth helps at all.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- Reading left to right.
ABappliesBfirst (rightmost is closest to the vector). - Swapping order.
AB ≠ BAin general. - Confusing commutative with associative. Order matters; grouping does not.
- Leaning on the rote recipe. Fall back to: each column of
ABisAapplied to that column ofB.
The one-line version
Section titled “The one-line version”Matrix multiplication is composition wearing a number grid: do one move, then the next, right to left.