Skip to content

Summary: Dot products and projection

The very first lesson promised that AI compares vectors “using the dot product, an operation you will meet later.” This is later. The dot product takes two vectors and returns one number, and it has two formulas that look unrelated yet always agree. The whole lesson reduces to this: a dot product is one number answering “how much do these two vectors point the same way?”, computable from coordinates or from an angle, because a vector is a transformation in disguise. This is the scan-it-in-five-minutes version.

  • The algebraic formula: multiply matching components and add, v1·w1 + v2·w2 + ... + vn·wn. The output is a single number, exactly the “multiply and add” the first lesson said attention was built on. Anchor: [3,4] · [1,0] = 3.
  • The geometric formula: v · w = |v| · |w| · cos(θ), the two lengths times the cosine of the angle between them. It looks nothing like the algebraic version but gives the identical answer every time.
  • The sign carries the meaning, straight from the cosine: positive means the vectors broadly point the same way (angle under 90 degrees), zero means perpendicular (cos 90° = 0, the clean right-angle test), negative means they broadly oppose. Anchor: [1,1] · [1,-1] = 0, two vectors at exactly 90 degrees.
  • Dotting with a unit vector is projection: v · u-hat = |v|·cos(θ) is the signed length of v’s shadow on the u-hat line, how far v reaches that way. The general formula’s two magnitudes are just the projection scaled by how long w is.
  • The two formulas agree because of duality. A 1-row matrix [a b] applied to [x, y] gives a·x + b·y, which is exactly [a, b] · [x, y]. Dotting with a vector is the same as applying the 1-row matrix that is that vector lying on its side, so a vector and a “vector-to-number” transformation are one object. The algebraic formula is the computation; the geometric formula is what that transformation does to space.
  • The dot product is commutative (v · w = w · v), obvious algebraically and explained geometrically by the symmetric |v|·|w|·cos(θ).
  • This is the operation AI uses to compare vectors. Attention scores token relevance as query · key (bigger dot product = more attention). Cosine similarity, the dot of two unit vectors, equals cos(θ) and is the standard “how similar are these embeddings” measure for search, clustering, and retrieval. A single neuron computes weight · input then a nonlinearity, asking how strongly the input points along its weight direction.

Before this lesson, the dot product may have been a formula you could evaluate (multiply and add) without a feel for what it meant. Now it is a single, interpretable question, “how much do these two point the same way?”, with a sign you can read and a projection picture behind it. When you next see query · key in an attention diagram, “cosine similarity” in a retrieval system, or a neuron’s weighted sum, you will recognize the same operation each time, and know it is a vector quietly acting as a transformation. The next lessons turn to the cross product: an operation that takes two vectors and returns a third, measuring how much they spread apart rather than how much they align.