Skip to content

Cheatsheet: The main NLP tasks

  1. Load + clean data (datasets, map, filter)
  2. Tokenize (with task-specific alignment)
  3. Load AutoModelFor<Task>
  4. Pick a data collator + a metric
  5. Train with Trainer (or Seq2SeqTrainer)
  6. Evaluate + push to Hub

Only the head, the label shape, and the metric change between tasks.

TaskHeadMetricShape
Sequence classificationAutoModelForSequenceClassificationaccuracy, F1encoder
Token classification (NER)AutoModelForTokenClassificationseqeval (entity F1)encoder
Extractive QAAutoModelForQuestionAnsweringSQuAD EM, F1encoder
Masked LMAutoModelForMaskedLMperplexityencoder
Causal LMAutoModelForCausalLMperplexitydecoder
SummarizationAutoModelForSeq2SeqLMROUGEencoder-decoder
TranslationAutoModelForSeq2SeqLMBLEU / SacreBLEUencoder-decoder
CollatorUsed for
DataCollatorWithPaddingSequence classification (dynamic padding)
DataCollatorForTokenClassificationToken classification (pads labels too)
DataCollatorForLanguageModelingMasked LM (and causal LM with mlm=False)
DataCollatorForSeq2SeqSummarization, translation

Token classification and QA work in token positions, but labels/answers live at word/character level. Use the fast tokenizer:

  • word IDs: spread a word’s label across its tokens (NER)
  • offsets: map predicted start/end token positions back to characters (QA)

This is why fast tokenizers matter (lesson 6).

Summarization and translation need:

  • AutoModelForSeq2SeqLM
  • Seq2SeqTrainingArguments + Seq2SeqTrainer
  • DataCollatorForSeq2Seq (targets are sequences, padded + shifted)
  • generation-based metrics: ROUGE (summarize), BLEU (translate)

The text is its own supervision; DataCollatorForLanguageModeling builds the targets:

  • Masked LM (BERT): random masking, mlm=True (default)
  • Causal LM (GPT): next-token, mlm=False
  • Metric: perplexity (lower is better)
  • Token classification: one label per token (NER, POS tagging).
  • Extractive QA: return a span of the source; vs generative QA, which writes a new answer (and can hallucinate).
  • Perplexity: a language-model quality metric; lower is better.
  • seqeval / ROUGE / BLEU / SQuAD: standard metrics for NER / summarization / translation / QA.
  • Hugging Face LLM Course, Chapter 7: “Main NLP tasks.” huggingface.co/learn/llm-course/chapter7. Released under Apache 2.0; this lesson mirrors its structure with original prose.