References: Joint embedding predictive architectures (JEPA) and world modeling
Source material
Section titled “Source material”Source material:• Stanford CS25 V6 (April 9, 2026): "From Representation Learning to World Modeling through Joint Embedding Predictive Architectures" Speakers: Hazel Nam and Lucas Maes (Brown University) YouTube: https://www.youtube.com/watch?v=GBd7iuJkW08 Course site: https://web.stanford.edu/class/cs25/ License (lecture video): as published on Stanford's public CS25 YouTube channel (link-out only)
Clawdemy provides original notes, summaries, and quizzes derived from thismaterial for educational purposes. All rights to the original lecture remainwith Stanford and the speakers.What this lesson draws from each source
Section titled “What this lesson draws from each source”- Nam and Maes’s CS25 V6 lecture anchors the topic and the bridge from representation learning (I-JEPA / V-JEPA) to world modeling. The lecture’s framing of “from representation learning to world modeling” is the structural arc this lesson mirrors.
- The explicit recap of generative pretraining as the dominant Phases 2-3 objective, the “surface-reproduction tax” articulation, the side-by-side generative-vs-JEPA comparison table, and the operational scope test applied to JEPA + world-modeling territory are Clawdemy’s own connective tissue.
Going deeper
Section titled “Going deeper”- “Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture” (Assran et al., I-JEPA, 2023). The I-JEPA paper, the reference account of the recipe described here. Section 3 walks the architecture and training loop in detail.
- “Revisiting Feature Prediction for Learning Visual Representations from Video” (Bardes et al., V-JEPA, 2024). The V-JEPA paper, extending the I-JEPA recipe to spacetime patches of video.
- “A Path Towards Autonomous Machine Intelligence” (LeCun, 2022). LeCun’s white paper laying out the world-modeling thesis that JEPA instantiates. Position paper rather than experimental, but the strongest single account of why he argues this direction matters.
Adjacent topics
Section titled “Adjacent topics”- Self-supervised learning more broadly. Masked autoencoding (MAE), contrastive learning (SimCLR family), and JEPA all approach the question “learn good representations without labels” from different angles. Reading them as a family clarifies the tradeoffs.
- World models in reinforcement learning. The Dreamer family of model-based RL systems is one of the longest-running lines on world models for planning; comparing their generative-frame-prediction approach to a JEPA-style alternative is the live research frontier.
- Multimodal world models for science (the next lesson). Takes the world-modeling idea into a specific scientific application (drug discovery) where multimodal data streams need to be fused, and shows how the framing pays off in practice.
Community discussion
Section titled “Community discussion”None selected for this lesson at the present time. The I-JEPA and V-JEPA papers plus LeCun’s position paper together are the strongest public account of the direction. If a canonical secondary discussion surfaces, it will be added at the next review.