Summary: Giving agents memory

An agent is amnesiac by default; memory is something you design in. It lives in two very different places: the short-term context that lasts a single run, and the persistent memory that survives across runs. The interesting question is not where to store memory but what is worth remembering at all. This summary is the scan version of the full lesson.

Core ideas

A word-sense note: “memory” gets reused across AI. It can mean a neural network’s internal running state, or the context window a model reads in one pass. This lesson means a third thing: information an agent carries from one run to the next.
The loop forgets by default. The agent works within a single run; when the run ends, the working state is discarded. Nothing persists across runs unless you build a place for it.
Short-term context lasts one run: the conversation so far, this run’s tool results, and the agent’s scratchpad. It is the working memory that lets the agent remember, three steps in, what it already did. Gone when the run ends, like notes on a phone call.
Persistent memory survives across runs: preferences, learned facts, summaries of past conversations. It is what makes “book my usual sync” work, the usual was stored on an earlier run and read back on this one. It is what turns a capable tool into something that feels like an assistant.
The hard part is choosing what to keep, not where to store it. “Remember everything” fails three ways: context cost (the finite window fills with noise), staleness (stored facts go out of date), and privacy (persistent memory is stored personal data, a liability if you keep more than you need).
Four kinds are usually worth keeping: preferences, identity and stable facts, summaries of past runs, and corrections. The decision rule is not “can I store this” but “will this be worth having next time.”
Remembering is also updating. When a fact changes, revise the stored value rather than appending a contradicting one, or the agent holds both and cannot tell which is true. Memory is a living record, not a pile that only grows.
The two kinds work together. A run loads relevant persistent memory into short-term context at the start, works within it, and writes anything newly worth keeping back at the end. The agent feels continuous because each run reloads what mattered from the last.

What changes for you

“The AI remembers me” stops being magic and becomes a design decision you can interrogate. When an assistant recalls your preferences, you know that is persistent memory doing its job; when it forgets something from five minutes ago in a new session, you know that was short-term context that was never written down. And if you build an agent, you have the real question in hand from the start: not “how do I store memory” but “what is actually worth remembering, and when does it go stale?”