References: Function approximation and deep RL
Source material
Section titled “Source material”Source curriculum (structural mirror, cited as further study):• David Silver, "Reinforcement Learning" (UCL course), Lecture 6: Value Function Approximation Author: David Silver Course page: https://davidstarsilver.wordpress.com/teaching/ License: CC BY-NC 4.0Clawdemy's lessons are original prose that follows the pedagogical arc of thiscourse. We do not embed, reproduce, or transcribe Silver's slides or videolectures; we link out to the relevant lecture as recommended further study.The non-commercial clause aligns with Clawdemy's free, zero-revenue posture.All rights to the original materials remain with the author and UCL.
Source-scope note: this lesson mirrors the value-function-approximationmaterial in Silver's Lecture 6 (linear and neural-network parameterization,the semi-gradient TD update, the deadly triad) and adds the DQN bridge withthe two engineering fixes (experience replay, target network) at the close.The one-feature linear-Q worked example illustrating generalization acrossstates from a single transition, and the step-size-too-large overshootdiagnostic in practice, are Clawdemy framing designed to make the"function approximation is delicate" intuition concrete. Exact per-lectureURLs are verified at promotion. The DQN paper reference (Mnih et al. 2015)is given in "Going deeper" below.Read this next
Section titled “Read this next”- David Silver, UCL RL course, Lecture 6: Value Function Approximation by David Silver. The lecture this lesson mirrors, with linear and nonlinear function approximation, the semi-gradient form, the deadly triad explicitly named, and a brief look at DQN. CC BY-NC 4.0, freely available.
Going deeper
Section titled “Going deeper”A short, durable list. Both are free.
- Mnih et al., “Human-level control through deep reinforcement learning” (Nature, 2015) — the DQN paper. The full algorithm (Q-learning + a deep convolutional network + experience replay + target network) and the Atari benchmark that catalyzed modern deep RL. Available widely online.
- Sutton and Barto, “Reinforcement Learning: An Introduction” (2nd edition), Chapter 11 (Off-policy Methods with Approximation). The textbook treatment of the deadly triad and the practical remedies, with the theoretical analysis behind the fixes DQN uses.
Adjacent topics
Section titled “Adjacent topics”Where this leads inside this track.
- Q-learning: model-free control. The previous lesson. This lesson takes Q-learning’s exact update and replaces the table with a function approximator; the recursion does not change.
- Temporal-difference learning. Lesson 7. The deadly triad was named there; this lesson is where the third leg (function approximation) is added and the triad becomes a real engineering concern.
- Policy gradient and the path to modern RL. The next lesson and the close of the track. The function-approximation move done here on V/Q is paralleled there for the policy itself, learning a parameterized policy directly. The two halves together (value-based + policy-based) cover the modern landscape and bridge to RLHF.