References: Function approximation and deep RL

Source material

Source curriculum (structural mirror, cited as further study):
• David Silver, "Reinforcement Learning" (UCL course), Lecture 6:
  Value Function Approximation
  Author: David Silver
  Course page: https://davidstarsilver.wordpress.com/teaching/
  License: CC BY-NC 4.0
Clawdemy's lessons are original prose that follows the pedagogical arc of this
course. We do not embed, reproduce, or transcribe Silver's slides or video
lectures; we link out to the relevant lecture as recommended further study.
The non-commercial clause is now consistent with Clawdemy's own CC BY-NC-SA 4.0 license; both forbid commercial use without permission. Commercial use is licensed separately at [/legal/licensing](/legal/licensing/).
All rights to the original materials remain with the author and UCL.

Source-scope note: this lesson mirrors the value-function-approximation
material in Silver's Lecture 6 (linear and neural-network parameterization,
the semi-gradient TD update, the deadly triad) and adds the DQN bridge with
the two engineering fixes (experience replay, target network) at the close.
The one-feature linear-Q worked example illustrating generalization across
states from a single transition, and the step-size-too-large overshoot
diagnostic in practice, are Clawdemy framing designed to make the
"function approximation is delicate" intuition concrete. Exact per-lecture
URLs are verified at promotion. The DQN paper reference (Mnih et al. 2015)
is given in "Going deeper" below.

Going deeper

A short, durable list. Both are free.

Mnih et al., “Human-level control through deep reinforcement learning” (Nature, 2015) — the DQN paper. The full algorithm (Q-learning + a deep convolutional network + experience replay + target network) and the Atari benchmark that catalyzed modern deep RL. Available widely online.
Sutton and Barto, “Reinforcement Learning: An Introduction” (2nd edition), Chapter 11 (Off-policy Methods with Approximation). The textbook treatment of the deadly triad and the practical remedies, with the theoretical analysis behind the fixes DQN uses.

Adjacent topics

Where this leads inside this track.

Q-learning: model-free control. The previous lesson. This lesson takes Q-learning’s exact update and replaces the table with a function approximator; the recursion does not change.
Temporal-difference learning. Lesson 7. The deadly triad was named there; this lesson is where the third leg (function approximation) is added and the triad becomes a real engineering concern.
Policy gradient and the path to modern RL. The next lesson and the close of the track. The function-approximation move done here on V/Q is paralleled there for the policy itself, learning a parameterized policy directly. The two halves together (value-based + policy-based) cover the modern landscape and bridge to RLHF.

References: Function approximation and deep RL

Source material

Read this next

Going deeper

Adjacent topics