Skip to content

References: The main NLP tasks

Source curriculum (structural mirror, cited as further study):
• Hugging Face, "LLM Course", Chapter 7: "Main NLP tasks"
Authors: the Hugging Face team (Lewis Tunstall, Leandro von Werra,
Lysandre Debut, Sylvain Gugger, Merve Noyan, and others)
Course page: https://huggingface.co/learn/llm-course/chapter7
Code and notebooks: https://github.com/huggingface/course
License: Apache 2.0 (prose and code)
Required attribution: "Based on the Hugging Face LLM Course
(huggingface.co/learn/llm-course), © Hugging Face, used under the
Apache 2.0 license. This is an independent structural mirror;
Hugging Face does not endorse it."
This lesson mirrors the structure of Chapter 7 (token classification,
masked and causal language modeling, summarization, translation, and
question answering). Clawdemy's lessons are original prose that follows the
pedagogical arc of the course. We do not reproduce or transcribe the
course; we cite it as the recommended companion. Course materials are used
under the Apache 2.0 license with the attribution above, which requires a
link to the license and an indication of changes, and does not permit
implying endorsement.
  • Hugging Face LLM Course, Chapter 7: Main NLP tasks. The chapter this lesson mirrors. Each task has its own section with a complete, runnable training script (and a from-scratch training-loop variant). When you have a specific task to build, that task’s section is the worked example to follow line by line.

A short, durable list. Each link is a specific next step, not a generic pile.

  • The transformers task guides. The official how-to pages, one per task (token classification, question answering, summarization, translation, language modeling). The canonical reference for the exact data preparation each task needs.

  • The evaluate metrics for these tasks. Where seqeval, rouge, sacrebleu, and squad live, with usage examples. Check the right metric for your task here before scoring.

  • The SQuAD dataset. The standard extractive question-answering benchmark; its model cards and leaderboard show what “good” looks like and how the exact-match and F1 metrics behave.

Where this connects inside the track.

  • Fine-tune a pretrained model (lesson 3). The loop here is exactly the lesson-3 loop; every task just changes the head, the label shape, and the metric.

  • Tokenizers up close (lesson 6). The word IDs and offsets introduced there are what make token classification and extractive question answering possible.

  • Debug your training and get unstuck (lesson 8). When one of these task pipelines breaks (and they will, often at the data-alignment step), the next lesson is how you diagnose it.