Skip to content

References: Multimodal world models for science

Source material:
• Stanford CS25 V5 (May 20, 2025):
"Multimodal World Models for Drug Discovery"
Speaker: Eshed Margalit (Noetik.ai; PhD in neuroscience, Stanford)
YouTube: https://www.youtube.com/watch?v=8kXIaUM3h1E
Course site: https://web.stanford.edu/class/cs25/past/cs25-v5/
License (lecture video): as published on Stanford's public CS25 YouTube
channel (link-out only)
Clawdemy provides original notes, summaries, and quizzes derived from this
material for educational purposes. All rights to the original lecture remain
with Stanford and the speaker.
  • Eshed Margalit’s CS25 V5 lecture anchors the topic: multimodal world models applied to drug discovery, with Noetik.ai’s OCTO virtual-cell and Perturb-map work as the concrete production example. The lecture covers the biological data heterogeneity challenge and the world-model framing as Noetik’s response.
  • The recap of generative-pretraining’s data-abundance assumption (and why it fails for biology), the side-by-side benchmark-vs-clinical claim contrast with explicit instruments per claim type, the medical-AI-specialized operational scope test, and the named conflation pitfall are Clawdemy’s own connective tissue around the lesson’s hard discipline boundary.
  • Noetik.ai’s OCTO virtual-cell project page. The company’s public framing of multimodal world models as simulators of patient biology, with technical context on what data streams are fused and what perturbation effects are predicted. The strongest first-party source for the work the source lecture covers.
  • Noetik AACR 2025 announcement. The American Association for Cancer Research presentation of Noetik’s OCTO + Perturb-map work; useful as a second source for the company’s scientific framing.
  • Stanford CS25 V5 schedule. The full V5 lineup; useful context for where this lecture sits in the series.
  • Translational science as a discipline. The field that studies how (and how often) benchmark-quality biological results translate to clinical-quality outcomes. The benchmark-vs-clinical gap this lesson names is precisely what translational science exists to study and (sometimes) close.
  • Cell painting and high-content imaging. The technical foundation for one of the major data modalities (cell microscopy) drug-discovery multimodal models fuse. Worth knowing exists; the imaging-side of the data pipeline.
  • Multimodal agents in production (the next lesson). Returns to consumer-product land; the engineering questions for multimodal AI in shipping products (RL co-design with the product) rather than the scientific application territory of this lesson.

None selected for this lesson at the present time. Noetik’s own public pages plus the source lecture are the strongest public account of the specific application. The translational-science literature is large but not condensed into a canonical reading; if a strong public discussion of the benchmark-vs-clinical gap surfaces in a Clawdemy-track-appropriate form, it will be added at the next review.