Vectors aren't arrows: abstract vector spaces

Fifteen lessons ago, in the very first one, we laid out three views of a vector and said something about the third. The math view, we said, “is the one that sounds like a non-answer, and it is actually the deepest of the three. The mathematician does not care whether your vector is an arrow or a list. They care about two questions: can you add two of them together and get another one of the same kind, and can you multiply one by a number and get another one of the same kind?”

This is the lesson where that pays off. Everything since then has used arrows and grids, because they are easy to picture. But the arrow was always a teaching model, not the definition. The definition is the algebra, and the algebra runs on anything that follows the two rules, including objects that look nothing like arrows. This lesson shows two of them, and then names what you have actually built.

Functions are vectors

Consider the set of all real-valued functions, things like f of x equals x-squared or g of x equals the sine of x. Can you add two of them and stay in the set? Yes: define the sum, “(f plus g) of x equals f of x plus g of x,” adding their outputs point by point, and the result is another function. Can you scale one by a number? Yes: “(c times f) of x equals c times f of x” multiplies every output by c, and the result is again a function.

Both operations stay inside the set of functions, which is the closure idea from the inverses lesson, the “result stays in the same system” rule from the very first one. Addition and scaling behave coherently. By the math view’s own test, functions are vectors. The set of all functions is a vector space, even though no function is an arrow and most cannot be written as a finite list of numbers.

Polynomials, with actual coordinates

Functions in general form an infinite-dimensional space: there is no finite list of basis functions that combines to make every possible function, so the geometric tools, which lean on a finite basis, are harder to apply directly. Polynomials are the friendly case, because they have a clean finite basis.

Take all polynomials of degree 3 or less. Every one of them is built from the four pieces 1, x, x-squared, and x-cubed, combined by adding and scaling. That set, the four pieces 1, x, x-squared, and x-cubed, is a basis for this space, in exactly the sense from the spans lesson: every polynomial is a unique linear combination of these four, and there are four of them, so the space has dimension 4.

Once you fix that basis, polynomials get coordinates. The polynomial 2 x-squared plus 5x plus 7 is “7 times 1, plus 5 times x, plus 2 times x-squared, plus 0 times x-cubed,” so in that basis its coordinates are the four numbers 7, 5, 2, 0:

2x^2 + 5x + 7   <->   [7, 5, 2, 0]

That is a real vector of four numbers, and it behaves like every vector from earlier in the track. Adding two polynomials is adding their coordinate vectors: the sum of “3 x-squared plus 1” and “x plus 2” has coordinates 1, 0, 3, 0 plus 2, 1, 0, 0, which equals 3, 1, 3, 0, and reads back as 3 x-squared plus x plus 3. The function-space addition is just coordinate addition, the same operation from the first lesson, on a different kind of object.

Take the polynomial 2x² + 5x + 7 as one object. As a function it draws a parabola. As coordinates in the basis 1, x, x², x³, it is the column [7, 5, 2, 0]. Same object, two notations; the bridge is the chosen basis.

The derivative is a matrix

Here is the moment that makes the whole abstraction worth it. A transformation on polynomials can be linear in exactly the sense of the transformations lesson, and the derivative is.

Differentiation is linear: the derivative of a sum is the sum of the derivatives, and the derivative of a scaled function is the scaled derivative. Those are the two rules (additivity and homogeneity) that defined a linear transformation back in 2D, now satisfied by the derivative operator. And a linear transformation, as we learned, is fully captured by where it sends the basis vectors, written as the columns of a matrix.

So apply the derivative to each basis polynomial and record where it lands, in coordinates:

d/dx (1)   = 0      ->  [0, 0, 0, 0]
d/dx (x)   = 1      ->  [1, 0, 0, 0]
d/dx (x^2) = 2x     ->  [0, 2, 0, 0]
d/dx (x^3) = 3x^2   ->  [0, 0, 3, 0]

Those four columns are the matrix of the derivative:

D = [ 0  1  0  0 ]
    [ 0  0  2  0 ]
    [ 0  0  0  3 ]
    [ 0  0  0  0 ]

Now differentiate 2 x-squared plus 5x plus 7 by matrix multiplication. Its coordinates are 7, 5, 2, 0, and applying D is the linear-combination-of-columns operation from the matrix-vector lesson:

7·[0,0,0,0] + 5·[1,0,0,0] + 2·[0,2,0,0] + 0·[0,0,3,0] = [5, 4, 0, 0]

The result 5, 4, 0, 0 reads back as 4x plus 5, which is exactly the derivative of 2 x-squared plus 5x plus 7. Calculus, by matrix multiplication. The derivative you may have met as a limit is, on polynomials, the same matrix-times-vector operation you learned on arrows. That is the power of the abstraction: a tool built for geometry turns out to compute something from a completely different branch of math, because both obey the same two rules.

In the basis 1, x, x², x³, the derivative becomes a matrix. The 4 by 4 D pulls each x^k entry into the k·x^(k-1) slot. Multiply D by the coordinate vector of 2x² + 5x + 7 and you get the coordinate vector of 4x + 5. The calculus rule and the matrix product are doing the same job.

What makes something a vector space

Mathematicians make the math view precise with a short list of axioms: addition has to commute, scaling has to distribute over addition, there has to be a zero that changes nothing, and a few more in the same spirit. You do not need to memorize them. The single takeaway is this: any set whose addition and scaling obey those rules is a vector space, and every tool from this entire track applies to it.

That is a remarkable promise. Spans, bases, dimension, linear transformations, matrices, the determinant, eigenvectors, change of basis, all of it works unchanged on functions, on polynomials, on any object that adds and scales coherently. The geometric intuition you built on arrows is not stuck in 2D; it is the behavior of the rules, and the rules do not care what the objects are. Two quick examples make that concrete.

Change of basis, on polynomials. The polynomial 2 x-squared plus 5x plus 7 has coordinates 7, 5, 2 in the basis 1, x, x-squared. But that basis is a choice, exactly as in the change-of-basis lesson. Use the basis “1, (1 plus x), and (1 plus x) squared” instead and ask for the same polynomial’s coordinates there: writing 2 x-squared plus 5x plus 7 as “a, plus b times (1 plus x), plus c times (1 plus x) squared” and matching terms gives c equals 2, b equals 1, a equals 4, so the coordinates are 4, 1, 2. Same polynomial, different basis, different coordinates, the same relativity you saw with arrows.

Eigenvectors, of the derivative. On the full space of functions, the eigenvectors of the derivative are the exponential functions, because differentiating e to the k-x gives back that same function scaled by the constant in the exponent. That is the eigenvector equation, M times the vector equals lambda times the vector, with the derivative as M and that constant as the eigenvalue. It is a large part of why exponentials run through the solutions of differential equations: they are the functions a derivative only stretches, never reshapes.

Why this matters when you use AI

This is where the track’s whole arc pays off for understanding AI, because modern machine learning lives almost entirely in abstract vector spaces, not in 2D arrows.

Embeddings are vectors in spaces of hundreds or thousands of dimensions: word embeddings, sentence embeddings, image and audio embeddings. You rank them with the dot product’s cosine similarity from the dot-product lesson, and compress them with the eigenvector and change-of-basis machinery behind PCA. None of those are arrows you can draw, but every operation on them is one you now understand.

Every neural network layer is a linear transformation followed by a nonlinear bend, and the linear part is a matrix acting on a vector in exactly the framework of the transformations lesson, just in a space too big to picture. Function spaces show up directly in the theory: the signals a convolution processes, the kernels behind Gaussian processes and the neural tangent kernel, are functions treated as vectors, the abstraction from this lesson made load-bearing.

The practical upshot: you can now read a paper or a doc that talks about “the latent space,” “the embedding space,” or “the function space” and know that all three mean a vector space in the sense of this lesson, where the intuition you built on a flat grid still holds.

To see the whole track in one pipeline, follow a retrieval system end to end. It embeds a query and a library of documents as vectors (the spaces from this lesson). It scores each document by the dot product of its embedding with the query’s, larger meaning more aligned (the dot-product lesson). It often compresses those embeddings into a smaller, better basis first, using the eigenvector and change-of-basis machinery behind PCA (the eigenvector and change-of-basis lessons). And the embeddings themselves come out of layers that are linear transformations followed by nonlinearities (the transformations and matrix-multiplication lessons). Five lessons, one system. None of it involves an arrow you could draw, and all of it is the geometry you spent this track building.

Common pitfalls

Thinking a vector has to be an arrow or a list. That was the teaching model, never the definition. A vector is anything you can add and scale coherently, and functions and polynomials qualify as fully as arrows do.

Forgetting that coordinates depend on a basis here too. A polynomial’s coordinates 7, 5, 2, 0 are relative to the basis 1, x, x-squared, x-cubed. Choose a different basis for the same space and the same polynomial gets different coordinates, exactly as in the change-of-basis lesson.

Assuming abstraction means new rules. It does not. Functions and polynomials use the same addition, the same scaling, the same matrices and eigenvectors. The objects changed; the algebra did not.

Treating the axioms as the thing to learn. The axioms are the formal fine print. The thing to learn is the consequence: satisfy them, and the entire track applies to you.

What you should remember

A vector is anything you can add and scale coherently, just as the first lesson said. Functions and polynomials pass that test, so they are vectors, and they form vector spaces despite looking nothing like arrows.
Fix a basis and abstract vectors get coordinates you can compute with. Polynomials of degree 3 or less have basis 1, x, x-squared, x-cubed and dimension 4; the derivative becomes an honest matrix, and differentiating is matrix-vector multiplication.
Every tool from this track applies to any vector space. Spans, bases, dimension, transformations, determinants, eigenvectors, change of basis: they are properties of the two rules, not of arrows. That is why the geometric intuition you built carries into the high-dimensional spaces where AI actually lives.

You started this track with three definitions of a vector that seemed to be about different things, and you end it knowing they were always one thing: an object you can add and scale, whether it is an arrow on a page, a list in a computer, a polynomial, or the internal state of a model with thousands of dimensions. The arrows were the scaffolding. The algebra was the building. And the next time you read about an embedding space or a latent space, you will know you have already done the geometry.