Skip to content

References: Model-based RL, learning the dynamics

Primary sources (load-bearing for this lesson)

Section titled “Primary sources (load-bearing for this lesson)”
  • Sutton, R. S. (1991). Dyna, an integrated architecture for learning, planning, and reacting. ACM SIGART Bulletin, 2(4), 160-163. https://dl.acm.org/doi/10.1145/122344.122377 The original Dyna architecture. Sutton & Barto Chapter 8 has the modern treatment.
  • Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. Free online: http://incompleteideas.net/book/the-book-2nd.html Chapter 8 (Planning and Learning) is the canonical treatment of Dyna and the model-based / model-free distinction.

Linear-Gaussian dynamics and iterative LQR

Section titled “Linear-Gaussian dynamics and iterative LQR”
  • Chua, K., Calandra, R., McAllister, R., & Levine, S. (2018). Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. NeurIPS 2018. https://arxiv.org/abs/1805.12114 PETS. The headline 10× to 100× sample-efficiency claim. Probabilistic ensembles with trajectory sampling.
  • Janner, M., Fu, J., Zhang, M., & Levine, S. (2019). When to Trust Your Model: Model-Based Policy Optimization. NeurIPS 2019. https://arxiv.org/abs/1906.08253 Short imagined rollouts (1 to 5 steps) feeding a SAC-style model-free learner; the practical recipe.
  • Ha, D., & Schmidhuber, J. (2018). World Models. NeurIPS 2018. https://arxiv.org/abs/1803.10122 Training policies entirely in a learned latent dream-world.
  • Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2020). Dream to Control: Learning Behaviors by Latent Imagination. ICLR 2020. https://arxiv.org/abs/1912.01603 DreamerV1.
  • Hafner, D., Lillicrap, T., Norouzi, M., & Ba, J. (2021). Mastering Atari with Discrete World Models. ICLR 2021. https://arxiv.org/abs/2010.02193 DreamerV2.
  • Hafner, D., Pasukonis, J., Ba, J., & Lillicrap, T. (2023). Mastering Diverse Domains through World Models. arXiv:2301.04104. https://arxiv.org/abs/2301.04104 DreamerV3.
  • Schrittwieser, J., Antonoglou, I., Hubert, T., et al. (2020). Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588, 604-609. https://www.nature.com/articles/s41586-020-03051-4 MuZero. Learns the dynamics implicitly inside the MCTS planning loop.

Berkeley CS285 (course source for this track)

Section titled “Berkeley CS285 (course source for this track)”
  • Levine, S. (2023). CS285 lecture on Model-Based Reinforcement Learning. UC Berkeley. https://rail.eecs.berkeley.edu/deeprlcourse/ Lecture covering model fitting and Dyna. CS285 L16 is the natural pair (planning with the model), the source for L10.
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. Chapter 5 (Machine Learning Basics) covers least squares and the bias-variance decomposition referenced in the lesson.
  • Boyd, S., & Vandenberghe, L. (2018). Introduction to Applied Linear Algebra. Cambridge University Press. Chapters 12-13 on least-squares estimation. Free online at https://web.stanford.edu/~boyd/vmls/.
  • Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. NeurIPS 2017. https://arxiv.org/abs/1612.01474 The “deep ensembles” paper that’s standard for epistemic uncertainty estimation. Used by PETS.
  • Kendall, A., & Gal, Y. (2017). What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? NeurIPS 2017. https://arxiv.org/abs/1703.04977 The clearest articulation of the aleatoric / epistemic split.
Source curriculum (structural mirror, cited as further study):
• UC Berkeley CS285: Deep Reinforcement Learning (Sergey Levine)
Course page: http://rail.eecs.berkeley.edu/deeprlcourse/
Lecture videos: YouTube (link-out only)
Clawdemy's lessons are original prose that follows the pedagogical arc of this
source. We do not reproduce or transcribe it; we cite it as a recommended
companion. All rights to the original material remain with its authors.