References: What learning really means

Source material

Source curriculum (structural mirror, cited as further study):
• 3Blue1Brown, Neural Networks, Chapter 2: "Gradient descent, how neural networks learn"
  Creator: Grant Sanderson (text adaptation by Josh Pullen)
  Lesson page: https://www.3blue1brown.com/lessons/gradient-descent
  Series index: https://www.3blue1brown.com/?topic=neural-networks
  License: copyright Grant Sanderson; videos published on his site and YouTube
This lesson mirrors the opening of Chapter 2, where the cost function is
introduced and learning is framed as minimizing it. Clawdemy's lessons are
original prose that follows the pedagogical arc of this series. We do not
reproduce or transcribe the videos; we cite them as the recommended companion.
All rights to the original videos remain with the creator.

Watch this next

Gradient descent, how neural networks learn (3Blue1Brown) by Grant Sanderson. Chapter 2 of the series, and the source for this lesson and the next two. The opening minutes introduce the cost idea visually: you watch a confident-and-correct output score low and a confused output score high. Watch up to where the “cost landscape” picture appears, then come back for lesson 6, which is exactly that picture.

Going deeper

A short, durable list. Each link is a specific next step, not a generic pile.

Neural Networks and Deep Learning, Chapter 1 (the “cost function” section) by Michael Nielsen. Introduces the same squared-difference cost (Nielsen calls it the quadratic cost) and explains carefully why a smooth cost is what makes learning tractable. The natural deeper read on this exact idea.
TensorFlow Playground. Run a network and watch the “Training loss” number in the corner fall as it learns. That falling number is a cost exactly like the one in this lesson. Seeing it drop in real time makes “learning is minimizing a number” concrete.

Adjacent topics

Where this leads inside this track.

The cost landscape (lesson 6). This lesson said the cost is a function of about 13,000 parameters. Lesson 6 turns that into a picture: a landscape over the space of all possible parameter settings, where height is cost, and the goal is to find a low valley.
Gradient descent (lesson 7). Once you can picture the landscape, you need a way to actually walk downhill in it without being able to see the whole thing at once. Lesson 7 is that method, the algorithm that gives this whole chapter its name.