Run a model in a few lines, pipelines

In the last lesson you got the working picture: tokens in, tokens out, attention in the middle. Now you make one of these models actually run, and the first surprise is how little code it takes. Two lines gets you a working sentiment classifier. The rest of this lesson is about what those two lines hide, because once you can open the box, you can do far more than the one-liner allows.

Keep a Python environment open as you read (a Colab notebook is the zero-setup option). Everything here is meant to be run, not just looked at. If you have not installed the library, install transformers once and you are set.

The fastest path: the pipeline function

The transformers library gives you a single function that wraps an entire task end to end. You name the task, it picks a sensible default model, downloads it, and gives you something you can call like a function.

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier(
    [
        "I've been waiting for a course like this my whole life.",
        "I hate this so much!",
    ]
)

[{'label': 'POSITIVE', 'score': 0.9598},
 {'label': 'NEGATIVE', 'score': 0.9995}]

That is the whole thing. No model files to hunt down, no preprocessing to write, no output decoding. The first run downloads the model and caches it; after that it is instant.

The same function covers a long list of tasks just by changing the string you pass: sentiment-analysis, zero-shot-classification, text-generation, summarization, translation, question-answering, fill-mask, ner (named-entity recognition), and more. You can also hand it a specific model from the Hub instead of the default by passing the model name as an argument. For a huge amount of practical work, this is genuinely all you need: pick the task, pick a model, call it.

So why go further? Because the pipeline makes choices for you, and the moment you want to make those choices yourself (a different model, a custom postprocessing step, raw scores instead of labels, batching your own way) you need to know what it was doing. It was doing three things.

The three steps the pipeline hides

Every pipeline groups three stages: preprocessing (turn text into numbers a model can read), the model (run those numbers through the network), and postprocessing (turn the model’s raw output back into something meaningful). Let us reproduce the sentiment example by hand, one stage at a time, using the Auto classes. The default model behind sentiment-analysis is a DistilBERT checkpoint, so we will load that explicitly (its exact name is in the code below).

Step 1: the tokenizer

A transformer cannot read raw text; it reads integers. The tokenizer is the piece that converts between the two, and it must split and number text in exactly the way the model saw during training. You get the matching tokenizer from the auto-tokenizer class, calling its from-pretrained method with the checkpoint name:

from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

raw_inputs = [
    "I've been waiting for a course like this my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

{
    'input_ids': tensor([[ 101, 1045, 1005, ... , 102],
                         [ 101, 1045, 5223, ... ,   0]]),
    'attention_mask': tensor([[1, 1, 1, ... , 1],
                              [1, 1, 1, ... , 0]])
}

Two things come back. The input IDs are the text as integers, one row per sentence. The attention mask is a row of 1s and 0s telling the model which positions are real tokens and which are padding. Three arguments earn their keep here. Padding makes the two sentences the same length by filling the shorter one with a padding token (the trailing zeros). Truncation cuts anything longer than the model can handle. And the return-tensors argument asks for PyTorch tensors, which is what the model expects as input. Pass one sentence or a list; you get back a dictionary ready to feed straight to the model.

Step 2: the model

You download the model the same way, with the from-pretrained method and the same checkpoint. The base model class gives you the transformer itself, which outputs hidden states (also called features): a high-dimensional vector for every token, representing the model’s contextual understanding of the input.

from transformers import AutoModel

model = AutoModel.from_pretrained(checkpoint)
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

torch.Size([2, 16, 768])

That shape is worth reading: 2 sentences, 16 tokens each, and a 768-dimensional vector per token. Those hidden states are rich, but they are not an answer. To get an answer you need a head: a small layer (or few) bolted on top of the base model that projects the hidden states down to whatever the task needs. The library ships many head variants, and the class name tells you which:

Class	What it does
AutoModel	just the hidden states, no head
AutoModelForSequenceClassification	classify the whole input
AutoModelForTokenClassification	label each token (e.g. NER)
AutoModelForQuestionAnswering	find an answer span in a context
AutoModelForCausalLM	generate the next token (decoder-only)
AutoModelForMaskedLM	fill in blanks (encoder-only)

Sentiment is whole-sentence classification, so we want the sequence-classification head:

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)
print(outputs.logits.shape)

torch.Size([2, 2])

Now the output is 2 by 2: two sentences, two labels (positive and negative). The head collapsed those 768-dimensional vectors into one score per label.

Step 3: postprocessing

The model’s raw output looks like this:

print(outputs.logits)

tensor([[-1.5607,  1.6123],
        [ 4.1692, -3.3464]])

Those are not probabilities. They are logits, the raw unnormalized scores from the last layer. Transformer models output logits because training fuses the final softmax into the loss function, so the model itself stops one step short. You finish the step with a softmax, which squashes each row into numbers that sum to 1:

import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

tensor([[0.0402, 0.9598],
        [0.9995, 0.0005]])

The last missing piece is which column is which label. The model carries that mapping in its config:

print(model.config.id2label)

{0: 'NEGATIVE', 1: 'POSITIVE'}

Put it together and the first sentence is 95.98% positive, the second is 99.95% negative, the exact numbers the one-line pipeline reported. You have just reproduced it by hand.

The Auto classes, as a pattern

Notice the shape of what you did. Every load was the same call: an Auto class’s from-pretrained method, handed a checkpoint. That is the core idiom of the whole library. The “Auto” part means you do not have to know whether the checkpoint is a BERT, a DistilBERT, or a GPT under the hood; the class inspects the checkpoint and instantiates the right architecture for you. You change models by changing one string.

This is why the ecosystem feels uniform even though it spans hundreds of architectures. Want a different sentiment model? Change the checkpoint. Want a different task on the same text? Change the head class (the token-classification model instead of the sequence-classification one). The tokenizer, the model, and the head all load through the same from-pretrained door, keyed by a name from the Hub.

Why this matters when you use AI

Two takeaways carry forward into every later lesson.

The pipeline is the right tool until it is not. For exploring, prototyping, and a surprising amount of production work, the one-liner is correct and you should reach for it first. Do not hand-roll the three steps when the pipeline already does them well.

But the three steps are always there, and knowing them is what lets you customize. When you need raw logits for a custom threshold, a specific model the pipeline does not default to, control over batching and padding, or a head the pipeline does not expose, you drop down to the Auto classes and do the three steps yourself. Everything later in this track (fine-tuning, the tokenizer internals, the NLP tasks) lives at this lower level. The pipeline was the elevator; the Auto classes are the stairs, and you want to know where the stairs are.

What you should remember

A pipeline call runs a whole task at once: it picks a default model, downloads it, and handles preprocessing and postprocessing for you. Reach for it first.
A pipeline hides three steps: preprocessing with a tokenizer, the model forward pass, and postprocessing.
The tokenizer turns text into numbers. Loaded with the auto-tokenizer class, it produces the input IDs and an attention mask; the padding, truncation, and tensor-type arguments shape the output.
The base model gives the bare hidden states; the task-specific model classes add a head that turns those hidden states into task-specific output. The class name tells you the task.
Models output logits, not probabilities. Apply a softmax to get probabilities, and read the id-to-label mapping in the model’s config to attach labels.
The whole library is one idiom: an Auto class’s from-pretrained method, handed a checkpoint. Change the string to change the model; change the head class to change the task.

The pipeline is two lines because the library hid three steps inside it. Learn the three steps and you stop being limited to what the two lines allow.