Solving by area ratios, Cramer's rule

A few lessons back, when we built the inverse, we said the solution to M times the unknown equals the target is the unknown equals M-inverse times the target whenever M is invertible, and then deliberately did not compute it. This lesson is one way to compute it. It is called Cramer’s rule, and what makes it worth your time is not that it is the fastest method (it is not), but that it falls straight out of a single fact you already own: a linear transformation scales every area by its determinant.

By the end you will solve a 2x2 system as a ratio of determinants, and, more importantly, you will see exactly why that ratio is the answer.

The problem, restated

We have an invertible 2x2 matrix M and a target vector the target, and we want the input vector with components the x-coordinate and the y-coordinate that M sends to the target. This is the same solve-the-system question from the inverses lesson, where we learned the solution exists and is unique when the determinant of M is nonzero. Cramer’s rule pins down the x-coordinate and the y-coordinate from the entries of M and the target, without ever building M-inverse explicitly.

The one geometric idea

Here is the trick the whole rule rests on. Look at the parallelogram spanned by the unknown vector and the basis vector j-hat, the vector 0, 1. Its signed area is exactly the x-coordinate.

Why? The parallelogram has j-hat as one side, length 1, and the other side is the unknown vector. Its area is base times height: the base is j-hat (length 1, lying along the y-axis), and the height is the horizontal distance of the unknown vector from the y-axis, which is just the x-coordinate. So area equals 1 times the x-coordinate, which is the x-coordinate. (You can check it against the determinant: that area works out to exactly the x-coordinate.) The x-coordinate of the unknown is hiding in plain sight as an area.

j-hat is a base of length 1 along y; the unknown [x, y] reaches out to the right. The parallelogram they span has signed area exactly equal to x, because its base is 1 and its height (the rightward reach) is x. The unknown coordinate is geometrically just an area.

Apply the transformation

Now apply M to the whole picture. Every area scales by the determinant of M, the determinant lesson’s central fact. So the parallelogram whose area was the x-coordinate now has area the determinant of M times the x-coordinate.

But what is that transformed parallelogram made of? Its two sides were the unknown vector and j-hat, and M sends them to the target (the target, since M applied to the unknown is the target by definition) and to M applied to j-hat, which is just the second column of M. So the transformed parallelogram is spanned by the target and the second column of M, and its signed area is the determinant of the matrix with those two as columns.

Set the two expressions for that area equal:

det(M) · x = det([ b | second column of M ])

and solve:

x = det([ b | second column of M ]) / det(M)

That is the x-half of Cramer’s rule. The y-half comes from the identical argument with i-hat instead: the parallelogram spanned by i-hat and the unknown vector has signed area the y-coordinate, applying M scales it to the determinant of M times the y-coordinate, and the transformed parallelogram is spanned by the first column of M and the target. So:

y = det([ first column of M | b ]) / det(M)

Apply M to the previous parallelogram. j-hat lands at column 2 of M, and [x, y] lands at b = M·[x, y]. The new parallelogram is spanned by b and column 2 of M, with area det(M) times x. Divide the new area by det(M), and you recover x. That ratio IS Cramer's rule.

The rule in one line

Both halves follow one pattern: to find a coordinate, replace the matching column of M with the target the target, take that determinant, and divide by the determinant of M.

x = det(M with first column replaced by b) / det(M)
y = det(M with second column replaced by b) / det(M)

This generalizes to any size: for an n by n invertible system, the i-th unknown is the determinant of M with its i-th column replaced by the target, divided by the determinant of M. The 2x2 case is just the smallest version of the same idea.

In 3D the same argument runs with volume in place of area. A coordinate of the unknown is the signed volume of a parallelepiped built from that vector and two basis vectors; applying M scales the volume by the determinant of M (now a volume-scaling factor, from the 3D determinant); and you solve identically, replacing one column of M with the target and dividing by the 3x3 determinant. The picture does not change, only the word: area becomes volume.

Notice the requirement built into the formula: you are dividing by the determinant of M. If the determinant of M is zero, the rule breaks with a division by zero, which is exactly right, because a zero determinant is the collapsed case from the inverses lesson, where the system has either no solution or infinitely many and no unique answer to compute.

Worked examples

The system from the inverses lesson. Solve “twice the x-coordinate plus the y-coordinate equals 3” and “the x-coordinate plus the y-coordinate equals 2,” which is M times the unknown equals the target with M having first column 2, 1 and second column 1, 1, and the target, the vector 3, 2. The inverses lesson told us the answer is the vector 1, 1; let us recover it.

First, the determinant of M is 2 times 1 minus 1 times 1, which is 1. Then:

x = det([[3, 1], [2, 1]]) / 1 = (3 - 2) / 1 = 1
y = det([[2, 3], [1, 2]]) / 1 = (4 - 3) / 1 = 1

For the x-coordinate, the first column of M (which is 2, 1) was replaced by the target, the vector 3, 2, giving columns 3, 2 and 1, 1. For the y-coordinate, the second column was replaced. The result is the vector 1, 1, matching the inverses lesson, and the original equations check: 2 times 1 plus 1 equals 3, and 1 plus 1 equals 2.

Watch the geometry behind those numbers. The answer, the x-coordinate, equals 1 is the signed area of the parallelogram spanned by the solution 1, 1 and j-hat, which is just 1. Apply M and that area scales by the determinant of M, which is 1, staying 1, and the transformed parallelogram is the one spanned by the target, the vector 3, 2 and the second column 1, 1, whose area is the determinant of the matrix with first row 3, 1 and second row 2, 1, equal to 1. The numerator and the coordinate are the same area, before and after the transformation. The ratio recovered the coordinate exactly because that is what the ratio was built to do.

A fresh system. Solve “three times the x-coordinate minus the y-coordinate equals 7” and “the x-coordinate plus twice the y-coordinate equals 0,” so M has first column 3, 1 and second column negative-1, 2, and the target is the vector 7, 0. Here the determinant of M is 3 times 2 minus negative-1 times 1, which is 7. Then:

x = det([[7, -1], [0, 2]]) / 7 = (14 - 0) / 7 = 2
y = det([[3, 7], [1, 0]]) / 7 = (0 - 7) / 7 = -1

So the unknown vector is 2, negative-1. Check: 3 times 2 minus negative-1 equals 7, and 2 plus 2 times negative-1 equals 0. Both hold.

A collapsed system. Take the non-invertible M with first column 2, 1 and second column 4, 2, with the target, the vector 3, 2. Now the determinant of M is 2 times 2 minus 4 times 1, which is 0, so Cramer’s rule would divide by zero: it has no answer to give. That is honest, not broken. The underlying system has no unique solution: the vector 3, 2 is not on the column-space line (the line through 2, 1), since reaching it would need a scaling of 1.5 from the first coordinate but 2 from the second, a contradiction. No solution exists, and Cramer’s rule reports that by refusing to divide.

Why this matters when you use AI

The honest note first: Cramer’s rule is rarely how real systems get solved. Large linear systems inside machine learning are handled by iterative methods (the same family as gradient descent) and by Gaussian elimination, both far faster than computing a determinant for every unknown. You will not find Cramer’s rule in the inner loop of a training run.

Its value here is closure and intuition. The inverses lesson opened a question, how do you actually find the vector that lands on the target, and left it abstract. This lesson answers it with a formula you can derive from scratch using nothing but “areas scale by the determinant.” That is the real lesson: a solution method you might have met as a memorized rule is, underneath, a statement about how a transformation stretches area. Closed-form solutions for small systems, and the theory behind some exact solvers, lean on exactly this picture.

Common pitfalls

Replacing the wrong column. For the x-coordinate (the first coordinate), replace the first column of M with the target. For the y-coordinate, replace the second. Matching the coordinate to its column is the step people most often flip.

Forgetting to divide by the determinant of M. The numerator alone is not the answer; it is a determinant ratio. The denominator is always the determinant of the original, unmodified matrix.

Trying to use it when the determinant is zero. The rule requires an invertible matrix. A zero determinant means no unique solution, and the division by zero is the rule telling you so, not a computational glitch to work around.

Reaching for it on large systems. Cramer’s rule is a conceptual tool and a clean method for tiny systems. For anything sizable, elimination or an iterative solver is the right choice; Cramer’s determinant-per-unknown cost grows badly.

What you should remember

Cramer’s rule solves M times the unknown equals the target as a ratio of determinants: each coordinate is the determinant of M with its column replaced by the target, divided by the determinant of M. For 2x2, the x-coordinate is the determinant of the matrix with columns the target and the second column, all over the determinant of M; and the y-coordinate is the determinant of the matrix with the first column and the target, all over the determinant of M.
It works because a coordinate is a signed area, and transformations scale area by the determinant. The unknown x-coordinate is the area of the parallelogram from the unknown vector and j-hat; applying M scales that area by the determinant of M and turns it into a parallelogram you can measure from the target and a column of M. Equate and divide.
It needs a nonzero determinant, and the division by zero when the determinant is zero is the honest signal that the system has no unique solution, the collapsed case from the inverses lesson.

Cramer’s rule is the solve-the-system question answered in the language of area. You did not need the inverse explicitly; you needed the determinant lesson’s one fact, that area scales by the determinant, applied with care. The next lesson asks a different question about the same vectors: what happens to their coordinates when you change the basis you measure them in?