Fitting a line: linear regression
What you’ll learn
Section titled “What you’ll learn”This is lesson 2 of Track 10, in Phase 1 (What learning from data means). By the end you will be able to explain what a “best-fit line” actually is (the one line that makes the sum of squared residuals as small as possible) and read a fitted slope and intercept as a plain-language statement about a relationship. That single skill, turning two numbers into “for every extra unit of input, the output changes by this much,” is the foundation of every model that has weights, from this two-parameter line up to a billion-parameter network.
The track structurally mirrors StatQuest’s intuition-first machine learning videos, with Microsoft’s “ML For Beginners” as the hands-on companion for readers who want to build the models in code. Full attribution is in this lesson’s references.
Where this fits
Section titled “Where this fits”Lesson 1 drew the map of the field; this lesson plants the first concrete algorithm on it, the simplest supervised method there is. It is deliberately the second lesson because everything later leans on its core idea: a model is a set of numbers chosen to fit the data. The next lesson, gradient descent, answers the question this one leaves open (how do you actually find the best line?), and lesson 4, logistic regression, bends this same machinery into a classifier.
Before you start
Section titled “Before you start”Prerequisite: Lesson 1, What machine learning actually is. You need the idea of supervised learning and the distinction between regression (predicting a number) and classification, because linear regression is the archetypal regression method. No calculus required; comfort with the equation of a straight line is enough.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Describe a regression line as two parameters (slope and intercept)
- Define the best-fit line as the one that minimizes the sum of squared residuals
- Compute the sum of squared residuals by hand and use it to compare two lines
- Read a slope and intercept as a real-world relationship
- Explain how the idea extends to many features and to the weights of larger models
Time and difficulty
Section titled “Time and difficulty”- Read time: about 12 minutes
- Practice time: about 15 minutes (a by-hand SSR computation, a coefficient-reading exercise, and flashcards)
- Difficulty: standard