References: Why sequences need memory
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• MIT 6.S191, "Introduction to Deep Learning", Lecture 2: "Deep Sequence Modeling" Instructors: Alexander Amini and Ava Amini (MIT) Course page: https://introtodeeplearning.com Code and labs: https://github.com/aamini/introtodeeplearning License: MIT (slides, code, and labs); videos are YouTube standard Required attribution: "© Alexander Amini and Ava Amini, MIT 6.S191: Introduction to Deep Learning, IntroToDeepLearning.com"Clawdemy's lessons are original prose that follows the pedagogical arc of thiscourse. We do not reproduce or transcribe the lectures; we cite them as therecommended companion. Course materials are used under their MIT license withthe attribution above; all rights to the original videos remain with the creators.Watch this next
Section titled “Watch this next”- MIT 6.S191, Lecture 2: Deep Sequence Modeling by Alexander and Ava Amini. The lecture this lesson mirrors, from the course page. It covers recurrent networks with the instructors’ own diagrams and then carries on into attention and transformers (which this track splits into the next lesson). Watch the recurrent-network portion now; the attention portion pairs with lesson 3.
Going deeper
Section titled “Going deeper”A short, durable list. Each link is a specific next step, not a generic pile.
-
The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy. The famous post that made RNNs click for a generation of learners. It trains a character-level recurrent network on Shakespeare, code, and more, and shows the surprising structure it learns. The most enjoyable way to see “carry a memory forward” actually working.
-
Understanding LSTM Networks by Christopher Olah. The clearest visual explanation of the gated-memory designs this lesson only named. If you want to see exactly how the keep/overwrite/forget gates work, start here; the diagrams are the gold standard.
Adjacent topics
Section titled “Adjacent topics”Where this connects inside the track.
-
What deep learning adds (lesson 1). The previous lesson set up the tour and the four problem shapes. Sequences are the first shape; this lesson is that promised first stop.
-
Attention and transformers, in brief (lesson 3). Recurrence reads a sequence one step at a time and strains on long-range links. The next lesson meets a different answer that looks at all positions at once and weighs what matters. Track 5 (Transformers and LLMs) then goes deep on the architecture that idea produced.