References: Learning by trial and reward

Source material

Source curriculum (structural mirror, cited as further study):
• MIT 6.S191, "Introduction to Deep Learning", Lecture 5: "Deep Reinforcement Learning"
  Instructors: Alexander Amini and Ava Amini (MIT)
  Course page: https://introtodeeplearning.com
  Code and labs: https://github.com/aamini/introtodeeplearning
  License: MIT (slides, code, and labs); videos are YouTube standard
  Required attribution: "© Alexander Amini and Ava Amini, MIT 6.S191:
    Introduction to Deep Learning, IntroToDeepLearning.com"
Clawdemy's lessons are original prose that follows the pedagogical arc of this
course. We do not reproduce or transcribe the lectures; we cite them as the
recommended companion. Course materials are used under their MIT license with
the attribution above; all rights to the original videos remain with the creators.

Watch this next

MIT 6.S191, Lecture 5: Deep Reinforcement Learning by Alexander and Ava Amini. The lecture this lesson mirrors. It walks the agent-environment-reward loop with the instructors’ diagrams and shows agents learning to play games, the moving version of the maze and the loop here.

Going deeper

A short, durable list. Each link is a specific next step, not a generic pile.

Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy. A hands-on walk-through that trains an agent to play Pong from raw pixels in about 130 lines of Python. The most concrete way to watch this lesson’s loop, state, action, reward, improve, turn into a working agent.
Reinforcement Learning: An Introduction (2nd edition) by Richard Sutton and Andrew Barto. The standard textbook of the field, made freely available by the authors online (search the title). Dense but definitive; it builds the loop, policies, reward, and credit assignment from the ground up. The deep reference once the intuition here has landed.
The MIT 6.S191 software labs. The reinforcement-learning lab lets you train an agent in a simple environment and watch its policy improve. MIT-licensed; the hands-on companion to this lesson.

Adjacent topics

Where this connects inside the track.

Generating by denoising: diffusion (lesson 7). The previous lesson closed the generative phase. This lesson opened the final phase with a different kind of learning, from rewards rather than a fixed dataset.
Where deep learning breaks (lesson 9). This lesson hinted that real-world reinforcement learning is hard (sample-inefficient, brittle). The next lesson takes up that honest thread across all of deep learning: the limitations that confident demos tend to leave out.