Cheatsheet: Why sequences need memory
The one idea that matters
Section titled “The one idea that matters”sequence = ordered data where the order IS the meaning (text, audio, time series)fix = give the network a MEMORY: a running summary it updates one step at a timeThat memory is the “hidden state”; a network built around it is a recurrent neural network (RNN).
Why a feedforward network fails at sequences
Section titled “Why a feedforward network fails at sequences”| Problem | Why it breaks |
|---|---|
| Fixed input size | A sentence can be 3 words or 30; fixed input slots can’t fit both |
| No order, no memory | Sees one snapshot, answers in one shot; forgets the last input |
| No sharing across positions | A verb learned in slot 2 does nothing for slot 9; must relearn per position |
It is a shape problem, not a size problem. More neurons do not help.
The recurrence fix
Section titled “The recurrence fix”Read one element at a time. Keep a hidden state (the memory). At each step:
new memory = combine( current input , previous memory ) [weighted sum + squish](optionally read an answer out of the current memory)- Same weights reused every step → handles any length, shares what it learns across all positions.
- Memory carries context forward → “the clouds are in the ___” accumulates context word by word → “sky”.
Where simple recurrence struggles
Section titled “Where simple recurrence struggles”Long-range dependencies fade. “I grew up in France … I speak fluent ___” needs a word from the far start to survive many steps; in a simple RNN the early signal washes out (the vanishing-signal problem again).
- LSTM / GRU: recurrence with gated memory, little gates that decide what to keep, overwrite, or forget. Same loop, smarter memory, holds context longer.
- Attention (next lesson): drops the step-by-step march; looks at all positions at once.
Pitfalls to dodge
Section titled “Pitfalls to dodge”- “Just add neurons.” No. Fixed snapshots have no order or memory at any size. Recurrence changes the shape.
- “The hidden state stores the whole sequence.” No. It is a fixed-size running summary, not a transcript, which is why distant detail is lost.
- “Each step uses different weights.” No. One small network’s weights are reused every step.
- “LSTM/GRU is a new idea.” No. Recurrence with gated memory; the carry-a-memory-forward core is unchanged.
Words to use precisely
Section titled “Words to use precisely”- Sequence: ordered data where order carries meaning.
- Recurrent neural network (RNN): a network that reads one element at a time and loops, feeding its memory back into itself.
- Hidden state: the fixed-size running summary the network carries between steps (its memory).
- LSTM / GRU: recurrent designs with gates that control what memory to keep, overwrite, or forget.
- Context window (related): how much of a sequence a system can keep in mind at once.
The one-line version
Section titled “The one-line version”A feedforward network sees a snapshot; a recurrent network reads a story, carrying a memory forward one piece at a time.