Skip to content

Summary: The main NLP tasks

This is where the track’s pieces assemble. Every common NLP task follows one loop (load and clean data, tokenize with any alignment, load AutoModelFor<Task>, pick a collator and metric, train, evaluate, push), and only three things change between tasks: the head, the label shape, and the metric. Choosing a task is mostly choosing the model shape from lesson 1: understanding tasks want an encoder (sequence and token classification, question answering), generation wants a decoder (causal LM), and sequence-to-sequence wants an encoder-decoder (summarization, translation). Two wrinkles recur: token-level tasks (NER, QA) need fast-tokenizer word IDs and offsets to align labels and answer spans, and sequence-to-sequence tasks need Seq2SeqTrainer, DataCollatorForSeq2Seq, and generation metrics (ROUGE, BLEU). This is the scan version; the lesson builds the diagnosis skill.

  • One loop, three variables. The lesson-3 loop covers every task; what changes is the head (AutoModelFor<Task>), the shape of the labels, and the metric.
  • Choosing the task is choosing the shape. Encoder for understanding (classification, NER, QA), decoder for generation (causal LM), encoder-decoder for sequence-to-sequence (summarize, translate).
  • Token-level tasks need alignment. Token classification (AutoModelForTokenClassification, seqeval) and extractive QA (AutoModelForQuestionAnswering, SQuAD EM/F1) work in token positions and rely on fast-tokenizer word IDs and offsets.
  • Language modeling is self-supervised. Masked (AutoModelForMaskedLM, BERT) and causal (AutoModelForCausalLM, GPT) need no hand labels; DataCollatorForLanguageModeling makes them (use mlm=False for causal). Metric: perplexity.
  • Sequence-to-sequence needs its own tools. Summarization and translation use AutoModelForSeq2SeqLM, Seq2SeqTrainer, DataCollatorForSeq2Seq, and generation metrics (ROUGE, BLEU).
  • The applied skill is diagnosis. Name the task right and the head, data shape, and metric follow; most real mistakes are framing errors, not coding ones.

This lesson reframes what is actually hard in an applied NLP project. It is rarely the training code, which barely changes from task to task; it is the framing. “Pull names out of contracts” is token classification; “is this ticket angry” is sequence classification; “answer this from the manual” is extractive QA or a generation task. Get the diagnosis right and you walk a well-trodden path with a known head, data shape, and metric. Get it wrong and you fight the tooling, reaching for a decoder when you needed an encoder or hand-rolling a metric that already exists. That diagnostic instinct, shape first, then head, then metric, is what separates someone who can read a problem from someone who only knows one recipe. With the task landscape mapped, Phase 2 closes by learning what to do when the training run breaks.

The training loop barely changes from task to task. The skill that does the work is naming which task a problem is, because that one choice hands you the head, the data shape, and the metric all at once.