Summary: Why sequences need memory
A sequence is ordered data where the order is the meaning: sentences, audio, time series. The networks built so far take a whole input at once and answer in one shot, which makes them poor at sequences for three concrete reasons. The fix is to give the network a memory, a running summary it updates as it reads one piece at a time. That memory is the hidden state, and a network built around it is a recurrent neural network (RNN). This summary is the scan-it-in-five-minutes version; the lesson builds the intuition and shows where simple recurrence breaks.
Core ideas
Section titled “Core ideas”- Why a feedforward network fails at sequences. Three problems: its input size is fixed but sequences vary in length; it has no order and no memory (it sees one snapshot and forgets prior inputs); and it cannot share what it learns across positions (a pattern learned in one slot does nothing for another). It is a shape problem, not a size problem; more neurons do not help.
- The fix is recurrence. Read the sequence one element at a time, and after each element update a hidden state, a fixed-size running summary of everything seen so far. The network loops, feeding its own memory back into itself at every step. That is what “recurrent” means.
- One step of an RNN. Combine the current input with the previous hidden state (using the familiar weighted-sum-plus-squish) to produce the new hidden state, which passes forward. Optionally read an answer out of the current state.
- Two payoffs dissolve all three problems. The same weights are reused at every step, which handles any length and shares what it learns across positions; and the hidden state carries context forward, supplying the missing memory. (Weight-reuse is the same trick convolution uses for images.)
- Where simple recurrence strains. The early signal fades: as the hidden state is overwritten step after step, distant information washes out. “Marie was born in Paris … she dreams in fluent ____” needs a word from the far start to survive, and a simple RNN tends to forget it. Short-range context, fine; long-range context, lost.
- The gated fix, and the next move. LSTMs and GRUs add gates that decide what to keep, overwrite, or forget, holding important early information longer; they are recurrence with a smarter memory, not a new idea. And processing strictly one step at a time, in order, has its own costs, which set the stage for attention.
What changes for you
Section titled “What changes for you”Sequence models sit behind a huge share of the AI you use: predictive text, transcription, translation, and the language models in chat assistants. The idea you now hold, carry a memory forward and update it as you read, is the foundation those systems were built on. It also demystifies the “context window” you hear about: that is just how much of a sequence a model can keep in mind at once, a direct descendant of this lesson’s problem. Knowing memory is the hard part tells you where these tools are strong (short, local context) and where they strain (a long, distant thread). The next lesson, on attention, meets the idea that drops the step-by-step march entirely and lets every position look at every other at once.
A feedforward network sees a snapshot; a recurrent network reads a story, carrying a memory forward word by word.