Pairwise also generates training data for reward models (Phase 4). The lecturer flags this as a now-common loop: humans rate a calibration sample, LaaJ rates a large training sample, the result trains alignment for future models.
LLM-as-a-Judge (LaaJ): the practice of using one LLM to evaluate the output of another LLM.
Pointwise LaaJ: judge a single response (absolute PASS/FAIL).
Pairwise LaaJ: judge between two candidate responses (relative preference).
Criteria: the description in the LaaJ prompt of what counts as good.
Rationale: the written reasoning the judge produces before the score.
Position bias: judge prefers first-listed response.
Verbosity bias: judge prefers longer response.
Self-enhancement bias: judge prefers responses generated by itself.
Structured output: API feature that constrains the LLM’s output to a JSON schema; required for production parsing reliability.
Synthetic preference data: preference labels generated by pairwise LaaJ instead of human raters; feeds back into reward-model training.
Evaluating an LLM is itself an LLM-shaped problem. LaaJ scales human-style evaluation by replacing the human with another LLM. The three biases (position, verbosity, self-enhancement) are real, measured, and mitigable, but never fully eliminated.