References: Attention and transformers, in brief
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• MIT 6.S191, "Introduction to Deep Learning", Lecture 2: "Deep Sequence Modeling" Instructors: Alexander Amini and Ava Amini (MIT) Course page: https://introtodeeplearning.com Code and labs: https://github.com/aamini/introtodeeplearning License: MIT (slides, code, and labs); videos are YouTube standard Required attribution: "© Alexander Amini and Ava Amini, MIT 6.S191: Introduction to Deep Learning, IntroToDeepLearning.com"This lesson mirrors the attention/transformer portion of Lecture 2 (therecurrence portion is mirrored in lesson 2). Clawdemy's lessons are originalprose that follows the pedagogical arc of this course. We do not reproduce ortranscribe the lectures; we cite them as the recommended companion. Coursematerials are used under their MIT license with the attribution above; allrights to the original videos remain with the creators.Watch this next
Section titled “Watch this next”- MIT 6.S191, Lecture 2: Deep Sequence Modeling by Alexander and Ava Amini. The lecture this lesson mirrors. Its later portion introduces self-attention and transformers with the instructors’ own animations; pair it with this lesson for the visual version of “look at everything at once.”
Going deeper
Section titled “Going deeper”A short, durable list. Each link is a specific next step, not a generic pile.
-
Clawdemy Track 5 (Transformers and LLMs). This is the obvious next move if this brief tour left you wanting the real mechanics. Track 5 builds the transformer piece by piece, including how attention actually computes its relevance weights, what it means to attend in several ways at once, and how a model tracks word order without reading in order. Everything this lesson deferred lives there.
-
“Attention Is All You Need” (Vaswani et al., 2017). The paper that introduced the transformer and dropped recurrence entirely. The primary source for everything in this lesson; dense, but worth seeing once you have the intuition, if only to recognize how compact the original idea was.
-
The Illustrated Transformer by Jay Alammar. The most widely loved visual walk-through of the transformer, introducing the pieces one at a time with clear diagrams. The gentlest bridge between this survey and the full mechanics.
Adjacent topics
Section titled “Adjacent topics”Where this connects inside the track.
-
Why sequences need memory (lesson 2). The previous lesson built recurrence and named its weaknesses (slow, forgetful over distance). This lesson is the answer to those weaknesses, so read them as a pair.
-
How machines see: convolution (lesson 4). We now leave sequences for the second problem shape, images. The next lesson is about wiring a network to look at small local patches of an image, the idea called convolution.