References: What transformers do

Source material

Source curriculum (structural mirror, cited as further study):
• Hugging Face, "LLM Course", Chapter 1: "Transformer models"
  Authors: the Hugging Face team (Lewis Tunstall, Leandro von Werra,
    Lysandre Debut, Sylvain Gugger, Merve Noyan, and others)
  Course page: https://huggingface.co/learn/llm-course/chapter1
  Code and notebooks: https://github.com/huggingface/course
  License: Apache 2.0 (prose and code)
  Required attribution: "Based on the Hugging Face LLM Course
    (huggingface.co/learn/llm-course), © Hugging Face, used under the
    Apache 2.0 license. This is an independent structural mirror;
    Hugging Face does not endorse it."
This lesson mirrors the structure of Chapter 1 (what a transformer does,
the three architectural shapes, the timeline, pre-training versus
fine-tuning, the named limits, and the Hugging Face ecosystem). Clawdemy's
lessons are original prose that follows the pedagogical arc of the course.
We do not reproduce or transcribe the course; we cite it as the recommended
companion. Course materials are used under the Apache 2.0 license with the
attribution above, which requires a link to the license and an indication
of changes, and does not permit implying endorsement.

Going deeper

A short, durable list. Each link is a specific next step, not a generic pile.

“Attention Is All You Need” by Vaswani et al. (2017). The paper that introduced the transformer and the multi-head attention mechanism. You do not need the math to use the models, but this is the spine of the whole field, and skimming the abstract and the architecture diagram is worth ten minutes.
The Illustrated Transformer by Jay Alammar. The clearest visual walk-through of how attention actually moves information between tokens. The right thing to read if “attention in the middle” still feels like a black box and you want the picture without the equations.
The Hugging Face Hub. Browse real model cards. Filter by task and notice how models announce their shape (encoder, decoder, encoder-decoder), their base model, and their license. Reading a few cards now makes the next lessons feel familiar.

Adjacent topics

Where this connects inside the track and the wider curriculum.

Track 5 (Transformers and LLMs). The mechanics under the hood: queries, keys, values, multi-head attention, positional encoding. This lesson deliberately skips that math; Track 5 is where you get it if you want it.
Run any model in a few lines: pipelines and Auto classes (lesson 2). The next lesson stops describing transformers and starts running them. You will load a pretrained model and get a real result with the pipeline() function in a handful of lines of Python.

References: What transformers do

Source material

Read this next

Going deeper

Adjacent topics