References: The main NLP tasks
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• Hugging Face, "LLM Course", Chapter 7: "Main NLP tasks" Authors: the Hugging Face team (Lewis Tunstall, Leandro von Werra, Lysandre Debut, Sylvain Gugger, Merve Noyan, and others) Course page: https://huggingface.co/learn/llm-course/chapter7 Code and notebooks: https://github.com/huggingface/course License: Apache 2.0 (prose and code) Required attribution: "Based on the Hugging Face LLM Course (huggingface.co/learn/llm-course), © Hugging Face, used under the Apache 2.0 license. This is an independent structural mirror; Hugging Face does not endorse it."This lesson mirrors the structure of Chapter 7 (token classification,masked and causal language modeling, summarization, translation, andquestion answering). Clawdemy's lessons are original prose that follows thepedagogical arc of the course. We do not reproduce or transcribe thecourse; we cite it as the recommended companion. Course materials are usedunder the Apache 2.0 license with the attribution above, which requires alink to the license and an indication of changes, and does not permitimplying endorsement.Read this next
Section titled “Read this next”- Hugging Face LLM Course, Chapter 7: Main NLP tasks. The chapter this lesson mirrors. Each task has its own section with a complete, runnable training script (and a from-scratch training-loop variant). When you have a specific task to build, that task’s section is the worked example to follow line by line.
Going deeper
Section titled “Going deeper”A short, durable list. Each link is a specific next step, not a generic pile.
-
The
transformerstask guides. The official how-to pages, one per task (token classification, question answering, summarization, translation, language modeling). The canonical reference for the exact data preparation each task needs. -
The
evaluatemetrics for these tasks. Whereseqeval,rouge,sacrebleu, andsquadlive, with usage examples. Check the right metric for your task here before scoring. -
The SQuAD dataset. The standard extractive question-answering benchmark; its model cards and leaderboard show what “good” looks like and how the exact-match and F1 metrics behave.
Adjacent topics
Section titled “Adjacent topics”Where this connects inside the track.
-
Fine-tune a pretrained model (lesson 3). The loop here is exactly the lesson-3 loop; every task just changes the head, the label shape, and the metric.
-
Tokenizers up close (lesson 6). The word IDs and offsets introduced there are what make token classification and extractive question answering possible.
-
Debug your training and get unstuck (lesson 8). When one of these task pipelines breaks (and they will, often at the data-alignment step), the next lesson is how you diagnose it.