Why sequences need memory: cheatsheet

The one idea that matters

sequence = ordered data where the order IS the meaning (text, audio, time series)
fix       = give the network a MEMORY: a running summary it updates one step at a time

That memory is the “hidden state”; a network built around it is a recurrent neural network (RNN).

Why a feedforward network fails at sequences

Problem	Why it breaks
Fixed input size	A sentence can be 3 words or 30; fixed input slots can’t fit both
No order, no memory	Sees one snapshot, answers in one shot; forgets the last input
No sharing across positions	A verb learned in slot 2 does nothing for slot 9; must relearn per position

It is a shape problem, not a size problem. More neurons do not help.

The recurrence fix

Read one element at a time. Keep a hidden state (the memory). At each step:

new memory = combine( current input , previous memory )    [weighted sum + squish]
(optionally read an answer out of the current memory)

Same weights reused every step → handles any length, shares what it learns across all positions.
Memory carries context forward → “the clouds are in the ___” accumulates context word by word → “sky”.

Where simple recurrence struggles

Long-range dependencies fade. “I grew up in France … I speak fluent ___” needs a word from the far start to survive many steps; in a simple RNN the early signal washes out (the vanishing-signal problem again).

LSTM / GRU: recurrence with gated memory, little gates that decide what to keep, overwrite, or forget. Same loop, smarter memory, holds context longer.
Attention (next lesson): drops the step-by-step march; looks at all positions at once.

Pitfalls to dodge

“Just add neurons.” No. Fixed snapshots have no order or memory at any size. Recurrence changes the shape.
“The hidden state stores the whole sequence.” No. It is a fixed-size running summary, not a transcript, which is why distant detail is lost.
“Each step uses different weights.” No. One small network’s weights are reused every step.
“LSTM/GRU is a new idea.” No. Recurrence with gated memory; the carry-a-memory-forward core is unchanged.

Words to use precisely

Sequence: ordered data where order carries meaning.
Recurrent neural network (RNN): a network that reads one element at a time and loops, feeding its memory back into itself.
Hidden state: the fixed-size running summary the network carries between steps (its memory).
LSTM / GRU: recurrent designs with gates that control what memory to keep, overwrite, or forget.
Context window (related): how much of a sequence a system can keep in mind at once.

The one-line version

A feedforward network sees a snapshot; a recurrent network reads a story, carrying a memory forward one piece at a time.