Summary: Fitting a line: linear regression

Linear regression is the simplest supervised algorithm: it picks the one straight line that fits your data best, where “best” means the smallest sum of squared residuals. A line is just two numbers (a slope and an intercept), those numbers are the model’s parameters, and reading them gives you a plain-language statement about the relationship. This summary is the scan version of the full lesson.

Core ideas

A line is a prediction machine. prediction = intercept + slope times feature. Feed in an input, get a predicted output. The slope and intercept are the model’s parameters (weights); training means choosing them to fit the data.
Residuals measure error. A residual is the vertical gap between an actual point and the line’s prediction for it.
Best fit means least squares. Square each residual (so errors do not cancel and big misses cost more), add them up to get the sum of squared residuals (SSR), and choose the line that makes the SSR as small as possible. That is the whole definition.
Comparing lines is just comparing SSR. Two lines, two SSR numbers, lower wins. The true best-fit line is whichever one drives the SSR all the way down.
The slope is the relationship. It is the predicted change in the output per one-unit change in the input: its sign is the direction, its size is the strength. The intercept is the prediction when the input is zero.
Coefficients are readable, which is linear regression’s great advantage over more complex models. Multiple regression just gives each feature its own coefficient.
R-squared reports the fraction of variation the line explains, from 0 to 1, as a quick scale-free sense of fit.

What changes for you

The phrase “a model has parameters” stops being abstract. A linear regression has exactly two, a slope and an intercept, and you can read them out loud. That is the seed of every weight in every large model: a number tuned so predictions fit the data, just multiplied by a few billion. It also reframes why big models are hard to explain: here the coefficient tells you the relationship directly, but as models grow, the numbers stop being readable, and “why did it predict that?” turns from an easy question into a hard one. The open thread the lesson leaves you with: we defined the best-fit line, but how do you actually find it? That is gradient descent, next.