References: What transformers do
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• Hugging Face, "LLM Course", Chapter 1: "Transformer models" Authors: the Hugging Face team (Lewis Tunstall, Leandro von Werra, Lysandre Debut, Sylvain Gugger, Merve Noyan, and others) Course page: https://huggingface.co/learn/llm-course/chapter1 Code and notebooks: https://github.com/huggingface/course License: Apache 2.0 (prose and code) Required attribution: "Based on the Hugging Face LLM Course (huggingface.co/learn/llm-course), © Hugging Face, used under the Apache 2.0 license. This is an independent structural mirror; Hugging Face does not endorse it."This lesson mirrors the structure of Chapter 1 (what a transformer does,the three architectural shapes, the timeline, pre-training versusfine-tuning, the named limits, and the Hugging Face ecosystem). Clawdemy'slessons are original prose that follows the pedagogical arc of the course.We do not reproduce or transcribe the course; we cite it as the recommendedcompanion. Course materials are used under the Apache 2.0 license with theattribution above, which requires a link to the license and an indicationof changes, and does not permit implying endorsement.Read this next
Section titled “Read this next”- Hugging Face LLM Course, Chapter 1: Transformer models. The chapter this lesson mirrors. Free, Apache-2.0 licensed, and the natural companion to everything in Track 14. Read it for the course’s own framing and to meet the
pipeline()function you will use in the next lesson.
Going deeper
Section titled “Going deeper”A short, durable list. Each link is a specific next step, not a generic pile.
-
“Attention Is All You Need” by Vaswani et al. (2017). The paper that introduced the transformer and the multi-head attention mechanism. You do not need the math to use the models, but this is the spine of the whole field, and skimming the abstract and the architecture diagram is worth ten minutes.
-
The Illustrated Transformer by Jay Alammar. The clearest visual walk-through of how attention actually moves information between tokens. The right thing to read if “attention in the middle” still feels like a black box and you want the picture without the equations.
-
The Hugging Face Hub. Browse real model cards. Filter by task and notice how models announce their shape (encoder, decoder, encoder-decoder), their base model, and their license. Reading a few cards now makes the next lessons feel familiar.
Adjacent topics
Section titled “Adjacent topics”Where this connects inside the track and the wider curriculum.
-
Track 5 (Transformers and LLMs). The mechanics under the hood: queries, keys, values, multi-head attention, positional encoding. This lesson deliberately skips that math; Track 5 is where you get it if you want it.
-
Run any model in a few lines: pipelines and Auto classes (lesson 2). The next lesson stops describing transformers and starts running them. You will load a pretrained model and get a real result with the
pipeline()function in a handful of lines of Python.