Skip to content

References: Multi-task RL and meta-RL

  • Caruana, R. (1997). Multitask Learning. Machine Learning, 28, 41-75. https://link.springer.com/article/10.1023/A:1007379606734 The foundational multi-task learning paper. Defines positive and negative transfer, shared-representations framing.
  • Chen, Z., Badrinarayanan, V., Lee, C.-Y., & Rabinovich, A. (2018). GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. ICML 2018. https://arxiv.org/abs/1711.02257 Gradient normalization for multi-task balancing.
  • Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., & Finn, C. (2020). Gradient Surgery for Multi-Task Learning. NeurIPS 2020. https://arxiv.org/abs/2001.06782 Gradient-projection techniques for handling task-gradient conflicts.
  • Finn, C., Abbeel, P., & Levine, S. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017. https://arxiv.org/abs/1703.03400 The MAML paper. Defines the model-agnostic meta-learning objective and inner-outer optimization loop.
  • Finn, C., & Levine, S. (2018). Meta-Learning and Universality: Deep Representations and Gradient Descent Can Approximate Any Learning Algorithm. ICLR 2018. https://arxiv.org/abs/1710.11622 Theoretical analysis of MAML’s expressiveness.
  • Nichol, A., Achiam, J., & Schulman, J. (2018). On First-Order Meta-Learning Algorithms. arXiv:1803.02999. https://arxiv.org/abs/1803.02999 Reptile, a first-order approximation to MAML that avoids the unstable meta-gradient.
  • Duan, Y., Schulman, J., Chen, X., et al. (2016). RL²: Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv:1611.02779. https://arxiv.org/abs/1611.02779 The RL² paper. Treats meta-RL as a recurrent partially-observed MDP.
  • Wang, J. X., Kurth-Nelson, Z., Tirumala, D., et al. (2016). Learning to Reinforcement Learn. arXiv:1611.05763. https://arxiv.org/abs/1611.05763 Companion paper to RL²; introduces the recurrent-meta-RL framing from a neuroscience-inspired angle.
  • Rakelly, K., Zhou, A., Quillen, D., Finn, C., & Levine, S. (2019). Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables. ICML 2019. https://arxiv.org/abs/1903.08254 The PEARL paper. Probabilistic embeddings for the task latent variable; combines with SAC.
  • Zintgraf, L., Igl, M., Shiarlis, K., Mahajan, A., Hofmann, K., & Whiteson, S. (2020). Variational Task Embeddings for Fast Adaptation in Deep Reinforcement Learning. ICLR 2020. https://arxiv.org/abs/1910.08348 Bayesian RL² (variBAD); variational task posterior for the recurrent meta-RL family.
  • Yu, T., Quillen, D., He, Z., et al. (2019). Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning. CoRL 2019. https://arxiv.org/abs/1910.10897 Standard meta-RL benchmark with 50 robotic manipulation tasks.
  • Varghese, N. V., & Mahmoud, Q. H. (2020). A Survey of Multi-Task Deep Reinforcement Learning. Electronics, 9(9), 1363. https://www.mdpi.com/2079-9292/9/9/1363 Survey of multi-task RL practices.
  • Brown, T., Mann, B., Ryder, N., et al. (2020). Language Models are Few-Shot Learners. NeurIPS 2020. https://arxiv.org/abs/2005.14165 The GPT-3 paper. In-context learning as implicit meta-learning at scale. Re-read from the L17 angle to see the academic-meta-RL framings recognizable in production.
  • Wei, J., Tay, Y., Bommasani, R., et al. (2022). Emergent Abilities of Large Language Models. TMLR 2022. https://arxiv.org/abs/2206.07682 The characterization of in-context learning as a scale-emergent capability; relates to multi-task and meta-RL framings.
  • Vinyals, O., Babuschkin, I., Czarnecki, W. M., et al. (2019). Grandmaster level in StarCraft II using Multi-agent Reinforcement Learning. Nature, 575, 350-354. https://www.nature.com/articles/s41586-019-1724-z AlphaStar; multi-task training at scale on a complex environment.
  • Kalashnikov, D., Irpan, A., Pastor, P., et al. (2018). QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation. CoRL 2018. https://arxiv.org/abs/1806.10293 Large-scale single-skill grasping with a distributed Q-learning pipeline (not formally multi-task RL); demonstrated the offline-then-online structure that multi-task and meta-RL extensions then built on. Cited in L14 as well.

CS285 covers both multi-task RL and the meta-RL families. The primary algorithm citations (MAML, RL², PEARL) are the canonical papers. The foundation-model connection is named via Brown et al. (GPT-3) and characterization papers; the academic structure isolated in clean meta-RL settings is recognizable in production AI but is not formally framed as meta-RL by foundation-model practitioners. The lesson notes this parallel without endorsing any particular practitioner’s framing.

Source curriculum (structural mirror, cited as further study):
• UC Berkeley CS285: Deep Reinforcement Learning (Sergey Levine)
Course page: http://rail.eecs.berkeley.edu/deeprlcourse/
Lecture videos: YouTube (link-out only)
Clawdemy's lessons are original prose that follows the pedagogical arc of this
source. We do not reproduce or transcribe it; we cite it as a recommended
companion. All rights to the original material remain with its authors.