Skip to content

Fitting a line: linear regression

This is lesson 2 of Track 10, in Phase 1 (What learning from data means). By the end you will be able to explain what a “best-fit line” actually is (the one line that makes the sum of squared residuals as small as possible) and read a fitted slope and intercept as a plain-language statement about a relationship. That single skill, turning two numbers into “for every extra unit of input, the output changes by this much,” is the foundation of every model that has weights, from this two-parameter line up to a billion-parameter network.

The track structurally mirrors StatQuest’s intuition-first machine learning videos, with Microsoft’s “ML For Beginners” as the hands-on companion for readers who want to build the models in code. Full attribution is in this lesson’s references.

Lesson 1 drew the map of the field; this lesson plants the first concrete algorithm on it, the simplest supervised method there is. It is deliberately the second lesson because everything later leans on its core idea: a model is a set of numbers chosen to fit the data. The next lesson, gradient descent, answers the question this one leaves open (how do you actually find the best line?), and lesson 4, logistic regression, bends this same machinery into a classifier.

Prerequisite: Lesson 1, What machine learning actually is. You need the idea of supervised learning and the distinction between regression (predicting a number) and classification, because linear regression is the archetypal regression method. No calculus required; comfort with the equation of a straight line is enough.

  • Describe a regression line as two parameters (slope and intercept)
  • Define the best-fit line as the one that minimizes the sum of squared residuals
  • Compute the sum of squared residuals by hand and use it to compare two lines
  • Read a slope and intercept as a real-world relationship
  • Explain how the idea extends to many features and to the weights of larger models
  • Read time: about 12 minutes
  • Practice time: about 15 minutes (a by-hand SSR computation, a coefficient-reading exercise, and flashcards)
  • Difficulty: standard