Skip to content

Lesson: Giving agents memory

Every agent in this track so far has had amnesia. The weather agent answered your question and forgot you existed. The meeting-booking agent finished and kept nothing. Run it again tomorrow and it starts from zero, as if you had never met.

That is fine for a one-shot tool, but it is not what people mean when they imagine a helpful assistant. The assistant people want remembers their name, knows they prefer afternoon meetings, and does not ask the same question every single time. The difference between those two experiences is memory, and memory is not something an agent has automatically. It is something you design in.

A quick note on the word first, because it gets reused across AI and the senses are easy to mix up. In some contexts “memory” means a neural network’s internal running state. In others it means the context window a model reads in a single pass. This lesson uses a third, everyday sense: information an agent can carry from one run to the next. By the end you will be able to tell the two kinds of agent memory apart and, more importantly, decide what an agent should actually retain.

Recall the perceive-decide-act loop from Lesson 1: the agent reads the current state, decides what to do, acts, and repeats, all within a single run toward a single goal. Everything it works with lives in that run. When the run ends, the working state is discarded. There is no place information persists unless you build one.

So an agent has memory in exactly two places, and they are very different. One lives inside a single run. The other lives between runs. Keeping them straight is the whole lesson.

Short-term context: memory that lasts one run

Section titled “Short-term context: memory that lasts one run”

The first kind is the information present during a single run: the conversation so far, the results of tools the agent has called this run, and the agent’s own scratchpad of intermediate steps and reasoning. This is the agent’s working memory. It is what lets the meeting agent remember, three steps later, that it already checked Sarah’s calendar.

The key fact about short-term context is that it is temporary. It exists for the duration of one run and then it is gone. Start a fresh run tomorrow and none of it carries over. It is less like a diary and more like the notes you keep on a single phone call: useful while the call is happening, discarded when you hang up.

Persistent memory: memory that survives across runs

Section titled “Persistent memory: memory that survives across runs”

The second kind is information that outlives the run that created it: a user’s preferences, facts the agent has learned about the user or the world, and summaries of past conversations. This is what makes an assistant feel like it knows you. When you say “book my usual sync with Sarah” and the agent already knows your usual sync is 30 minutes on a Tuesday afternoon, that knowledge came from persistent memory, written down on some earlier run and read back on this one.

Worked example: the same request, with and without persistent memory

Section titled “Worked example: the same request, with and without persistent memory”
USER: Book my usual sync with Sarah.
WITHOUT persistent memory (amnesiac):
AGENT: "How long should the meeting be? Which day works? Is this
recurring?" (it has never heard of your "usual")
WITH persistent memory:
AGENT reads stored fact: usual sync = 30 min, Tuesday afternoons
AGENT -> books 30 min with Sarah, Tuesday 2:00pm
AGENT: "Booked your usual: 30 minutes with Sarah, Tuesday 2pm."

Same agent, same request. The difference is entirely whether a fact about you survived from a previous run. Persistent memory is what turns a capable tool into something that feels like an assistant.

The hard part is not storing, it is choosing

Section titled “The hard part is not storing, it is choosing”

It is tempting to think the interesting question is where to keep persistent memory. That is a real question, and the next lesson takes it up. But the harder and more important question comes first: what is worth remembering at all?

The naive answer, “remember everything,” is a trap, for three reasons.

  • Context cost. Anything you want the agent to use has to be loaded into its short-term context during a run, and that space is finite and not free. Remembering everything crowds the window with noise and raises cost and latency.
  • Staleness. Facts go out of date. A preference you stored a year ago may be wrong now. The more you remember indiscriminately, the more stale claims you carry.
  • Privacy. Persistent memory is stored data about a person. Remembering more than you need is not a feature; it is a liability you now have to protect and justify.

So the skill is selectivity. Keep what is durable, reusable, and specific to the user: stable preferences, identity facts, useful summaries of what happened. Discard what is transient: the small talk, the one-off details that will not matter next time, the noise. Microsoft’s agent-memory material describes one way to automate this, a separate step that reviews a conversation and decides which parts, if any, are worth saving. Whether automated or hand-designed, the decision is the same: not “can I store this” but “will this be worth having next time.”

In practice, the persistent memory worth keeping tends to fall into a few recognizable kinds, and naming them makes the retention decision easier.

  • Preferences. Stable choices the user has expressed: email over phone, afternoons over mornings, terse answers over long ones. These pay off on almost every future run.
  • Identity and stable facts. Durable details like an account ID, a team name, a home city. They rarely change and are needed repeatedly.
  • Summaries of past runs. A short recap of what happened last time, so the agent can pick up a thread instead of re-asking. The summary is the memory, not the full transcript.
  • Corrections. When a user fixes the agent (“no, I meant the other Sarah”), that correction is worth keeping so the same mistake does not repeat.

If a candidate fact does not fall into one of these, it is usually noise. The categories are not a rulebook, but they are a fast filter.

From a support conversation, the agent considers what to retain:
"I prefer email over phone for follow-ups." -> KEEP (durable preference)
"My account ID is 4471." -> KEEP (stable, reusable fact)
"It's raining here today." -> DISCARD (transient, no future use)
"Thanks, have a good one!" -> DISCARD (noise)

Two facts worth carrying forward, two not. That triage, done well, is most of what good agent memory is.

A subtle point hides inside the staleness problem: memory is not just something you add to, it is something you have to keep current. When a user changes a preference (“actually, mornings work better now”), the right move is to update the stored fact, not to append a second, contradicting one. An agent that only ever accumulates ends up holding both “prefers afternoons” and “prefers mornings” with no way to tell which is true. Good persistent memory supports change: facts get revised, and some get dropped when they stop being relevant. Treat memory as a living record, not a pile that only grows.

Short-term and persistent memory are not rivals; they hand off to each other. A well-built run starts by loading the relevant persistent memory into the short-term context (your preferences, the summary of last time), works with that context for the duration of the run, and at the end writes anything newly worth keeping back into persistent memory. Persistent memory is the long-term store; short-term context is the working space a run pulls from and writes back to. The agent feels continuous to you because each run quietly reloads what mattered from the last one.

  • Mistaking the context window for the agent’s memory. The context is short-term only; it vanishes when the run ends. Treating it as durable memory means everything is forgotten between runs.
  • Remembering everything. Indiscriminate memory bloats context, raises cost, and surfaces stale facts. Selectivity is the skill, not volume.
  • Never expiring anything. A preference stored once and never revisited can quietly become wrong. Durable is not the same as permanent.
  • Ignoring the privacy weight of memory. Persistent memory is stored personal data. Remembering more than you need is a liability, not a convenience.
  • Assuming memory is automatic. By default the loop forgets between runs. If an agent should remember, that is a thing you design, not a thing you get for free.
  • An agent is amnesiac by default. Memory is designed in, not automatic.
  • Short-term context lasts one run: the conversation, recent tool results, and the scratchpad. It is gone when the run ends.
  • Persistent memory survives across runs: preferences, learned facts, summaries. It is what makes an assistant feel like it knows you.
  • The hard question is what to retain, not where to store it. Keep durable, reusable, user-specific facts; discard transient noise. Over-remembering costs context, freshness, and privacy.
  • The two work together: a run loads relevant persistent memory into short-term context at the start and writes what is worth keeping back at the end.

We have talked about what an agent should remember but not where that memory lives or how the agent pulls the right piece at the right moment. That is a retrieval problem, and it has its own design pattern. The next lesson covers agentic RAG: treating retrieval as a tool the agent chooses to call, so it can reach into a body of stored information exactly when it needs to.