Skip to content

Cheatsheet: Why sequences need memory

sequence = ordered data where the order IS the meaning (text, audio, time series)
fix = give the network a MEMORY: a running summary it updates one step at a time

That memory is the “hidden state”; a network built around it is a recurrent neural network (RNN).

Why a feedforward network fails at sequences

Section titled “Why a feedforward network fails at sequences”
ProblemWhy it breaks
Fixed input sizeA sentence can be 3 words or 30; fixed input slots can’t fit both
No order, no memorySees one snapshot, answers in one shot; forgets the last input
No sharing across positionsA verb learned in slot 2 does nothing for slot 9; must relearn per position

It is a shape problem, not a size problem. More neurons do not help.

Read one element at a time. Keep a hidden state (the memory). At each step:

new memory = combine( current input , previous memory ) [weighted sum + squish]
(optionally read an answer out of the current memory)
  • Same weights reused every step → handles any length, shares what it learns across all positions.
  • Memory carries context forward → “the clouds are in the ___” accumulates context word by word → “sky”.

Long-range dependencies fade. “I grew up in France … I speak fluent ___” needs a word from the far start to survive many steps; in a simple RNN the early signal washes out (the vanishing-signal problem again).

  • LSTM / GRU: recurrence with gated memory, little gates that decide what to keep, overwrite, or forget. Same loop, smarter memory, holds context longer.
  • Attention (next lesson): drops the step-by-step march; looks at all positions at once.
  • “Just add neurons.” No. Fixed snapshots have no order or memory at any size. Recurrence changes the shape.
  • “The hidden state stores the whole sequence.” No. It is a fixed-size running summary, not a transcript, which is why distant detail is lost.
  • “Each step uses different weights.” No. One small network’s weights are reused every step.
  • “LSTM/GRU is a new idea.” No. Recurrence with gated memory; the carry-a-memory-forward core is unchanged.
  • Sequence: ordered data where order carries meaning.
  • Recurrent neural network (RNN): a network that reads one element at a time and loops, feeding its memory back into itself.
  • Hidden state: the fixed-size running summary the network carries between steps (its memory).
  • LSTM / GRU: recurrent designs with gates that control what memory to keep, overwrite, or forget.
  • Context window (related): how much of a sequence a system can keep in mind at once.

A feedforward network sees a snapshot; a recurrent network reads a story, carrying a memory forward one piece at a time.