Why sequences need memory
What you’ll learn
Section titled “What you’ll learn”This is the first stop on Track 12’s tour of problem shapes, and it tackles sequences: data that arrives in order, where the order carries the meaning. The lesson opens by showing why the networks built so far (which take a whole input at once) are badly suited to ordered data, names the fix (give the network a memory it updates as it reads), and builds the intuition for the recurrent neural network. The source curriculum is MIT 6.S191, Lecture 2, by Alexander and Ava Amini, freely available at introtodeeplearning.com.
You will see the three concrete reasons a feedforward network fails at sequences, meet the hidden state as the network’s running memory, understand why reusing one set of weights at every step is what makes recurrence work, and see exactly where simple recurrence strains (long-range dependencies fade), which sets up the next lesson on attention.
Where this fits
Section titled “Where this fits”This is lesson 2 of 10, opening Phase 1 (Foundations and sequences) properly after the lesson-1 orientation. The previous lesson, What deep learning adds, set up the four problem shapes; this is the first. The next lesson, Attention and transformers, in brief, is the direct answer to the weakness this lesson ends on (recurrence is slow and forgetful over distance), and Track 5 then goes deep on transformers.
Before you start
Section titled “Before you start”Prerequisites: lesson 1 of this track, and comfort with the neural-network basics from the previous track (a network is layers of neurons with weights tuned by gradient descent). No new math is required here.
About the math
Section titled “About the math”None. This lesson is conceptual: the intuition of a running memory, why it carries context forward, and why it eventually fades. There are no formulas to work; the practice section is a by-hand “trace the memory” exercise, not arithmetic.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Explain why a plain feedforward network struggles with ordered data (fixed input size, no memory, no sharing across positions)
- Describe how a recurrent network reads one element at a time and updates a hidden state that carries memory forward
- Explain how reusing one set of weights at every step lets an RNN handle any length and share what it learns
- Identify where simple recurrence strains (long-range dependencies fade) and what gated designs like LSTMs and GRUs add
Time and difficulty
Section titled “Time and difficulty”- Read time: about 9 minutes
- Practice time: about 10 minutes (a by-hand “trace the memory” exercise plus flashcards)
- Difficulty: standard (conceptual; no math)