Lesson: Dot products and projection
The very first lesson made a promise. It said that when a model decides how much attention one word should pay to another, it compares their vectors “using the dot product, an operation you will meet later in this track.” This is later. Here is the dot product, and by the end you will see exactly why it is the natural way to ask “how much do these two vectors agree?”
The dot product takes two vectors of the same dimension and returns a single number. What makes it worth a whole lesson is that it has two formulas that look completely different, one built from coordinates and one built from angles, and they always produce the same answer. Seeing why they agree is the point, and the reason ties back to the rectangular matrices from last lesson.
The algebraic formula
Section titled “The algebraic formula”Start with the mechanical definition, the one promised in the first lesson. To dot two vectors, multiply their matching components and add up the products:
[v1, v2, ..., vn] · [w1, w2, ..., wn] = v1·w1 + v2·w2 + ... + vn·wnMultiply first-with-first, second-with-second, on down the line, then sum. The output is one number, not a vector. That is the whole rule, and it is exactly the “multiply and add” operation the first lesson said attention was built on.
A clean first example:
[3, 4] · [1, 0] = (3)(1) + (4)(0) = 3Dotting the vector 3, 4 with the x-axis direction 1, 0 returns 3, which is just the x-coordinate of 3, 4. Hold that result; we are about to get it a second, completely different way.
The geometric formula
Section titled “The geometric formula”Here is the other formula for the same number:
v · w = |v| · |w| · cos(θ)where the magnitudes of the first vector and the second vector are the lengths of the two vectors and theta is the angle between them. This looks nothing like “multiply components and add,” yet it gives the identical answer every time. That is the surprise the lesson is built around.
This formula makes the meaning of the dot product visible, because the cosine of theta is large when two vectors point the same way and small or negative when they do not. Three cases follow directly from the cosine:
- Positive dot product: the vectors point in roughly the same direction. The angle between them is less than 90 degrees, so the cosine is positive.
- Zero dot product: the vectors are perpendicular. The angle is exactly 90 degrees, and the cosine of 90 degrees is 0, so the whole product is zero. A zero dot product is the clean test for “these two vectors are at right angles.”
- Negative dot product: the vectors point in roughly opposite directions. The angle is more than 90 degrees, where cosine goes negative.
So the sign of a dot product alone tells you whether two vectors broadly agree, ignore each other, or oppose. That is most of why it is useful.
A worked perpendicular case:
[1, 1] · [1, -1] = (1)(1) + (1)(-1) = 0The first vector points northeast, the second southeast, and they sit at exactly 90 degrees to each other. The algebra says zero, the geometry says perpendicular, and both are the same fact.
Projection: what the number measures
Section titled “Projection: what the number measures”The geometric formula has a vivid reading when one of the vectors has length 1. Let u-hat be a unit vector (length exactly 1). Then the first vector dot u-hat equals the length of the first vector times the cosine of theta, and that quantity is the signed length of the projection of the first vector onto the line through u-hat.
Picture dropping a perpendicular from the tip of the first vector straight down onto the line that u-hat lies along. The distance from the origin to the point where it lands, the foot of the perpendicular, is the projection length. It is signed: positive if the first vector leans the same way as u-hat, negative if it leans the opposite way. So the first vector dot u-hat answers “how far does the first vector reach along the u-hat direction?”
This is exactly what happened in the first example. Dotting the vector 3, 4 with 1, 0 (a unit vector along the x-axis) projected 3, 4 onto the x-axis, and the projection lands at 3, the x-coordinate. The dot product measured how far 3, 4 reaches in the x-direction.
That projection picture is also where the full geometric formula comes from. With a unit vector, the first vector dot u-hat is just the projection length. A general vector the second vector is its length times its direction, so dotting against it scales the projection by that length:
w = |w| · u-hatv · w = |w| · (v · u-hat) = |v| · |w| · cos(θ)The length of the second vector is simply how much longer the second vector is than a unit vector, stretching the projection by the same factor. That is why the lengths-and-cosine formula has both magnitudes in it.
Why the two formulas agree: duality
Section titled “Why the two formulas agree: duality”Why should “multiply components and add” equal “lengths times cosine”? The reconciling idea is the one last lesson set up, and it is genuinely beautiful.
Last lesson, a 1-by-n matrix was a transformation from n-dimensional space down to a single number. Write one out for 2D: the row matrix a, b applied to the vector x, y produces a single number, and that number is itself a dot product:
[a b] · [x, y] = a·x + b·y = [a, b] · [x, y]The act of dotting with the vector a, b is the same as applying the 1-by-n matrix that is the vector a, b lying on its side. The vector and the transformation are the same object: a vector dotted against things is just a “vector-to-number” transformation, and that transformation is the vector lying on its side.
That is the duality the chapter is named for. Every vector secretly defines a way of turning other vectors into numbers (dot with me), and every linear transformation that outputs a single number is secretly a vector (the one you dot with). The algebraic formula is the matrix-times-vector computation; the geometric formula is what that same transformation does to space, namely project and scale. They agree because they are two descriptions of one operation, the same way a matrix and a transformation were two descriptions of one move several lessons ago.
Two more worked examples
Section titled “Two more worked examples”Pinning down a cosine. Take the vector 3, 4 and the unit vector 1, 0 again. The algebraic formula gave 3. The geometric formula says the length of the first vector times the length of the second vector times the cosine of theta, and here the length of the first vector is 5 (the 3-4-5 right triangle from the first lesson) and the length of the second vector is 1, so:
5 · 1 · cos(θ) = 3 -> cos(θ) = 3/5 = 0.6The angle between 3, 4 and the x-axis has cosine 0.6, about 53 degrees. The two formulas did not just agree on the dot product; together they handed us the exact angle.
An opposing pair. Take the dot product:
[1, 0] · [-1, 1] = (1)(-1) + (0)(1) = -1The result is negative, which the sign rule says means the vectors point more than 90 degrees apart. Check it geometrically: the vector 1, 0 points east, negative-1, 1 points northwest, and the angle between them is 135 degrees. With the length of negative-1, 1 equal to the square root of 2 and the cosine of 135 degrees about negative 0.707, the geometric formula gives about negative 1:
1 · √2 · (-0.707) ≈ -1Negative dot product, obtuse angle, same number both ways.
Commutativity
Section titled “Commutativity”The dot product is commutative: the first vector dot the second vector equals the second vector dot the first vector. From the algebraic formula this is obvious, since “v-1 times w-1 plus v-2 times w-2” is the same sum as “w-1 times v-1 plus w-2 times v-2.” Geometrically it is less obvious (why should projecting the first vector onto the second vector give the same number as projecting the second vector onto the first vector?), and the duality insight is exactly what reconciles it: both are the single symmetric quantity, the length of the first vector times the length of the second vector times the cosine of theta, and the angle between two vectors does not care which one you name first.
Why this matters when you use AI
Section titled “Why this matters when you use AI”This is the lesson the first one pointed at, and the payoffs are everywhere.
Attention. When an attention mechanism asks how relevant one token is to another, it computes the dot product of a “query” vector with a “key” vector. A large dot product means the two vectors point the same way in the model’s space, so the token gets more attention. The comparison primitive at the heart of every transformer is the operation you just learned.
Cosine similarity. Dot two unit vectors and, by the geometric formula, you get exactly the cosine of theta, the cosine of the angle between them. That number, “cosine similarity,” is the standard measure of how similar two embeddings are, whether they represent words, sentences, or images. Search, clustering, and retrieval systems lean on it constantly: close in angle means close in meaning.
A single neuron. A neuron in a linear layer takes the dot product of its weight vector with the input, then applies a nonlinearity. The weight vector is a direction the neuron cares about, and the dot product measures how strongly the input points that way. A neuron is, at its core, asking one dot-product question of its input.
Common pitfalls
Section titled “Common pitfalls”Expecting a vector back. The dot product returns a single number, not a vector. If you computed a vector, you did something else (that is the cross product, two lessons from now).
Forgetting the cosine sign. A dot product can be negative, and the sign is information, not an error: positive means broadly same direction, zero means perpendicular, negative means broadly opposite.
Reading the dot product as distance. It is not how far apart two vectors are; it is how much they point the same way, scaled by their lengths. Two far-apart vectors can have a large positive dot product if they point the same direction, and two nearby vectors can dot to zero if they are perpendicular.
Forgetting the unit-vector condition for projection. The quantity the first vector dot u-hat is the projection length only when u-hat has length 1. If the second vector is not a unit vector, the dot product is the projection length scaled by that vector’s length.
What you should remember
Section titled “What you should remember”- The dot product turns two vectors into one number, two ways that always agree: algebraically, multiply matching components and add (“v-1 times w-1 plus v-2 times w-2” and so on); geometrically, the length of the first vector times the length of the second vector times the cosine of theta. The sign tells you whether the vectors broadly agree (positive), are perpendicular (zero), or oppose (negative).
- Dotting with a unit vector projects. The quantity the first vector dot u-hat is the signed length of the first vector’s shadow on the u-hat direction: how far the first vector reaches that way.
- The two formulas agree because of duality: dotting with a vector is the same as applying the 1-row matrix that is the vector lying on its side. A vector and a “vector-to-number” transformation are the same object, which is also why this is the operation AI uses to compare vectors, in attention, in cosine similarity, and inside every neuron.
A dot product is one number that answers “how much do these two vectors point the same way?” Compute it by multiplying and adding, or by lengths and an angle; the answer is the same, because the vector you dot against is just a transformation in disguise. The next lessons turn to an operation that takes two vectors and gives back a third: the cross product.