Why sequences need memory: brief

What you’ll learn

This is the first stop on Track 12’s tour of problem shapes, and it tackles sequences: data that arrives in order, where the order carries the meaning. The lesson opens by showing why the networks built so far (which take a whole input at once) are badly suited to ordered data, names the fix (give the network a memory it updates as it reads), and builds the intuition for the recurrent neural network. The source curriculum is MIT 6.S191, Lecture 2, by Alexander and Ava Amini, freely available at introtodeeplearning.com.

You will see the three concrete reasons a feedforward network fails at sequences, meet the hidden state as the network’s running memory, understand why reusing one set of weights at every step is what makes recurrence work, and see exactly where simple recurrence strains (long-range dependencies fade), which sets up the next lesson on attention.

Where this fits

This is lesson 2 of 10, opening Phase 1 (Foundations and sequences) properly after the lesson-1 orientation. The previous lesson, What deep learning adds, set up the four problem shapes; this is the first. The next lesson, Attention and transformers, in brief, is the direct answer to the weakness this lesson ends on (recurrence is slow and forgetful over distance), and Track 5 then goes deep on transformers.

Before you start

Prerequisites: lesson 1 of this track, and comfort with the neural-network basics from the previous track (a network is layers of neurons with weights tuned by gradient descent). No new math is required here.

About the math

None. This lesson is conceptual: the intuition of a running memory, why it carries context forward, and why it eventually fades. There are no formulas to work; the practice section is a by-hand “trace the memory” exercise, not arithmetic.

By the end, you’ll be able to

Explain why a plain feedforward network struggles with ordered data (fixed input size, no memory, no sharing across positions)
Describe how a recurrent network reads one element at a time and updates a hidden state that carries memory forward
Explain how reusing one set of weights at every step lets an RNN handle any length and share what it learns
Identify where simple recurrence strains (long-range dependencies fade) and what gated designs like LSTMs and GRUs add

Time and difficulty

Read time: about 9 minutes
Practice time: about 10 minutes (a by-hand “trace the memory” exercise plus flashcards)
Difficulty: standard (conceptual; no math)