References: Attention and transformers, in brief

Source material

Source curriculum (structural mirror, cited as further study):
• MIT 6.S191, "Introduction to Deep Learning", Lecture 2: "Deep Sequence Modeling"
  Instructors: Alexander Amini and Ava Amini (MIT)
  Course page: https://introtodeeplearning.com
  Code and labs: https://github.com/aamini/introtodeeplearning
  License: MIT (slides, code, and labs); videos are YouTube standard
  Required attribution: "© Alexander Amini and Ava Amini, MIT 6.S191:
    Introduction to Deep Learning, IntroToDeepLearning.com"
This lesson mirrors the attention/transformer portion of Lecture 2 (the
recurrence portion is mirrored in lesson 2). Clawdemy's lessons are original
prose that follows the pedagogical arc of this course. We do not reproduce or
transcribe the lectures; we cite them as the recommended companion. Course
materials are used under their MIT license with the attribution above; all
rights to the original videos remain with the creators.

Watch this next

MIT 6.S191, Lecture 2: Deep Sequence Modeling by Alexander and Ava Amini. The lecture this lesson mirrors. Its later portion introduces self-attention and transformers with the instructors’ own animations; pair it with this lesson for the visual version of “look at everything at once.”

Going deeper

A short, durable list. Each link is a specific next step, not a generic pile.

Clawdemy Track 5 (Transformers and LLMs). This is the obvious next move if this brief tour left you wanting the real mechanics. Track 5 builds the transformer piece by piece, including how attention actually computes its relevance weights, what it means to attend in several ways at once, and how a model tracks word order without reading in order. Everything this lesson deferred lives there.
“Attention Is All You Need” (Vaswani et al., 2017). The paper that introduced the transformer and dropped recurrence entirely. The primary source for everything in this lesson; dense, but worth seeing once you have the intuition, if only to recognize how compact the original idea was.
The Illustrated Transformer by Jay Alammar. The most widely loved visual walk-through of the transformer, introducing the pieces one at a time with clear diagrams. The gentlest bridge between this survey and the full mechanics.

Adjacent topics

Where this connects inside the track.

Why sequences need memory (lesson 2). The previous lesson built recurrence and named its weaknesses (slow, forgetful over distance). This lesson is the answer to those weaknesses, so read them as a pair.
How machines see: convolution (lesson 4). We now leave sequences for the second problem shape, images. The next lesson is about wiring a network to look at small local patches of an image, the idea called convolution.