Skip to content

Cheatsheet: Matrix multiplication as composition

Multiplying matrices means composing transformations.

AB = "first apply B, then apply A"
(AB) · v = A · (B · v)

Read right to left: the matrix nearest the vector acts first, exactly like the inner function in f(g(x)).

Each column of AB is A applied to the corresponding column of B.

StepOperation
First column of ABA · (first column of B)
Second column of ABA · (second column of B)

The columns of B are where B sends the basis vectors; running them through A gives where the combined move sends them. (The row-times-column recipe gives the same numbers but hides this meaning.)

PropertyStatementWhy
Not commutativeAB ≠ BA in generalOrder is the sequence of physical moves; rotate-then-shear ≠ shear-then-rotate.
Associative(AB)C = A(BC)The chain of moves is one fixed sequence; grouping changes only which adjacent pair you bundle first, never the order.
R = [ 0 -1 ] (90 deg CCW) S = [ 1 1 ] (shear)
[ 1 0 ] [ 0 1 ]

Rotate then shear (SR), apply S to columns of R:

SR = [ 1 -1 ] SR · [3, 4] = [-1, 3]
[ 1 0 ]

Shear then rotate (RS), apply R to columns of S:

RS = [ 0 -1 ] RS · [3, 4] = [-4, 7]
[ 1 1 ]

SR ≠ RS and [-1, 3] ≠ [-4, 7]: non-commutativity, made concrete.

With A = [[2,0],[0,1]], B = R, C = S:

(AB)C = [ 0 -2 ] A(BC) = [ 0 -2 ]
[ 1 1 ] [ 1 1 ]

Same matrix either grouping. This is why ABC needs no parentheses.

Composing linear transformations gives one linear transformation. So stacking linear layers alone collapses to a single layer (no added power). The nonlinear step between layers breaks that collapse, which is why depth helps at all.

  • Reading left to right. AB applies B first (rightmost is closest to the vector).
  • Swapping order. AB ≠ BA in general.
  • Confusing commutative with associative. Order matters; grouping does not.
  • Leaning on the rote recipe. Fall back to: each column of AB is A applied to that column of B.

Matrix multiplication is composition wearing a number grid: do one move, then the next, right to left.