Skip to content

References: Multimodal agents in production

Source material:
• Stanford CS25 V5 (April 8, 2025):
"RL as a Co-Design of Product and Research"
Speaker: Karina Nguyen (then at OpenAI; prior work on Claude at Anthropic,
and earlier R&D collaborations including with The New York Times)
YouTube: https://www.youtube.com/watch?v=gLwiPrwUDJ8
Course site: https://web.stanford.edu/class/cs25/past/cs25-v5/
License (lecture video): as published on Stanford's public CS25 YouTube
channel (link-out only)
Clawdemy provides original notes, summaries, and quizzes derived from this
material for educational purposes. All rights to the original lecture remain
with Stanford and the speaker.
  • Karina Nguyen’s CS25 V5 lecture anchors the topic and the central concepts: the tight co-design loop between research and product, evaluation metrics for real-world usability vs traditional benchmarks, RLHF and RLAIF as practical post-training levers, and the asymmetric-verification idea (“checking is easier than generating”) as a structural principle that recurs in modern training-loop design.
  • The recap of research vs production constraints, the enumeration of multimodal-specific production challenges (variable input sizes, output streaming quirks, tool-use latency, cross-modal quality calibration), and the explicit engineering-informs-vs-settles discipline are Clawdemy’s own connective tissue around the lecture’s content.
  • Karina Nguyen’s talk page. The speaker’s own catalogue of public talks and writings; the lesson source is among them, and several adjacent talks unpack the asymmetric-verification and RLAIF-co-design themes in more depth.
  • Stanford CS25 V5 schedule. The V5 lineup; useful context for where this lecture sits relative to the rest of the series.
  • “Constitutional AI: Harmlessness from AI Feedback” (Bai et al., Anthropic, 2022). A foundational reference for the RLAIF family: how AI-generated feedback can replace or supplement human feedback in alignment training. Predates the Karina Nguyen lecture but is the canonical first articulation of the RLAIF idea in the public literature.
  • RLHF (Reinforcement Learning from Human Feedback). The well-known post-training paradigm; one of the standard tools in the production co-design toolkit. Covered in adjacent Clawdemy tracks at depth.
  • Evaluation harness design. A research area in its own right; how do you build an evaluation system that measures what users actually need, not just what is convenient to score? The benchmark-vs-usability gap is what this work exists to address.
  • The closer of this track (lesson 10). Synthesizes the cross-cutting threads from L1 to L9 and names the frontiers not covered, including some that this lesson’s production-engineering lens implies but does not develop.

None selected for this lesson at the present time. The lecture and the speaker’s talk catalogue are the strongest first-party sources. If a canonical secondary discussion (e.g., a widely-shared synthesis of production-multimodal-AI engineering patterns) surfaces in a Clawdemy-track-appropriate form, it will be added at the next review.