References: Multimodal agents in production

Source material

Source material:
• Stanford CS25 V5 (April 8, 2025):
  "RL as a Co-Design of Product and Research"
  Speaker: Karina Nguyen (then at OpenAI; prior work on Claude at Anthropic,
           and earlier R&D collaborations including with The New York Times)
  YouTube: https://www.youtube.com/watch?v=gLwiPrwUDJ8
  Course site: https://web.stanford.edu/class/cs25/past/cs25-v5/
  License (lecture video): as published on Stanford's public CS25 YouTube
                           channel (link-out only)

Clawdemy provides original notes, summaries, and quizzes derived from this
material for educational purposes. All rights to the original lecture remain
with Stanford and the speaker.

What this lesson draws from each source

Karina Nguyen’s CS25 V5 lecture anchors the topic and the central concepts: the tight co-design loop between research and product, evaluation metrics for real-world usability vs traditional benchmarks, RLHF and RLAIF as practical post-training levers, and the asymmetric-verification idea (“checking is easier than generating”) as a structural principle that recurs in modern training-loop design.
The recap of research vs production constraints, the enumeration of multimodal-specific production challenges (variable input sizes, output streaming quirks, tool-use latency, cross-modal quality calibration), and the explicit engineering-informs-vs-settles discipline are Clawdemy’s own connective tissue around the lecture’s content.

Going deeper

Karina Nguyen’s talk page. The speaker’s own catalogue of public talks and writings; the lesson source is among them, and several adjacent talks unpack the asymmetric-verification and RLAIF-co-design themes in more depth.
Stanford CS25 V5 schedule. The V5 lineup; useful context for where this lecture sits relative to the rest of the series.
“Constitutional AI: Harmlessness from AI Feedback” (Bai et al., Anthropic, 2022). A foundational reference for the RLAIF family: how AI-generated feedback can replace or supplement human feedback in alignment training. Predates the Karina Nguyen lecture but is the canonical first articulation of the RLAIF idea in the public literature.

Adjacent topics

RLHF (Reinforcement Learning from Human Feedback). The well-known post-training paradigm; one of the standard tools in the production co-design toolkit. Covered in adjacent Clawdemy tracks at depth.
Evaluation harness design. A research area in its own right; how do you build an evaluation system that measures what users actually need, not just what is convenient to score? The benchmark-vs-usability gap is what this work exists to address.
The closer of this track (lesson 10). Synthesizes the cross-cutting threads from L1 to L9 and names the frontiers not covered, including some that this lesson’s production-engineering lens implies but does not develop.

Community discussion

None selected for this lesson at the present time. The lecture and the speaker’s talk catalogue are the strongest first-party sources. If a canonical secondary discussion (e.g., a widely-shared synthesis of production-multimodal-AI engineering patterns) surfaces in a Clawdemy-track-appropriate form, it will be added at the next review.