Practice: Fitting a line: linear regression
Self-check
Section titled “Self-check”Seven short questions. Try to answer each one before opening the collapsible. Active retrieval is where the learning sticks.
1. What two numbers define a linear regression line, and what does each do?
Show answer
The slope and the intercept. The slope is how much the prediction changes per one-unit increase in the input; the intercept is the prediction when the input is zero. In machine learning terms they are the model’s parameters (weights).
2. What is a residual?
Show answer
The vertical distance between an actual data point and the line’s prediction for it. It is the model’s error on that single point.
3. Why do we square the residuals before adding them up?
Show answer
Two reasons. Squaring makes every error positive, so errors above and below the line cannot cancel each other out. And it punishes big misses much more than small ones (an error of 4 contributes 16, an error of 2 contributes only 4).
4. State the definition of the best-fit line in one sentence.
Show answer
It is the line, out of all possible lines, that makes the sum of squared residuals as small as possible. Because we minimize squared residuals, the method is called least squares.
5. A fitted line reads y = 3 + 2x. What does the 2 tell you?
Show answer
For every one-unit increase in x, the predicted y increases by 2. The slope is the relationship: its sign is the direction, its size is the strength per unit.
6. In multiple regression with three features, how many coefficients are there, and how do you read one?
Show answer
One coefficient per feature (three), plus the intercept. Each coefficient is the predicted change in the output for a one-unit change in that feature, holding the other features fixed.
7. What question does this lesson leave open for the next one?
Show answer
How do you actually find the line that minimizes the sum of squared residuals, when you cannot just compare a couple of guesses? For a simple line there is a formula, but in general you search for it step by step. That search is gradient descent.
Try it yourself: compute the sum of squared residuals
Section titled “Try it yourself: compute the sum of squared residuals”Here is a dataset and one candidate line. Compute the residual for each point, square it, and add them up to get the SSR.
Data points: (1, 2) (2, 5) (3, 6)Candidate line: y = 2x + 0.5Show answer
x=1 -> predicts 2.5 actual 2 residual -0.5 squared 0.25x=2 -> predicts 4.5 actual 5 residual 0.5 squared 0.25x=3 -> predicts 6.5 actual 6 residual -0.5 squared 0.25Sum of squared residuals = 0.25 + 0.25 + 0.25 = 0.75The SSR for this line is 0.75. On its own that number means little; its job is to be compared against other lines. A line with a smaller SSR fits this data better, and the best-fit line is the one that drives the SSR as low as it will go.
Try it yourself: read the coefficients
Section titled “Try it yourself: read the coefficients”A model has been fit to predict delivery time from distance, and it comes out as:
delivery_minutes = 12 + 4 * distance_kmAnswer three things: what does the slope mean, what does the intercept mean, and how long is a 5 km delivery predicted to take?
Show answer
- Slope (4): each additional kilometre of distance adds 4 minutes to the predicted delivery time.
- Intercept (12): at a distance of zero, the predicted time is 12 minutes (a fixed base time, perhaps handling and handoff).
- Prediction for 5 km: 12 + 4 times 5 = 12 + 20 = 32 minutes.
This is the payoff of linear regression: the coefficients are not just machinery, they are a readable statement about the relationship.
Flashcards
Section titled “Flashcards”Ten cards. Click any card to reveal the answer. Use the Print flashcards button for one card per page.
Q. What two numbers define a linear regression line?
The slope (change in prediction per one-unit increase in the input) and the intercept (prediction when the input is zero). These are the model’s parameters.
Q. What is a residual?
The vertical distance between an actual data point and the line’s prediction for it: the model’s error on that single point.
Q. Why square the residuals?
To make every error positive (so errors above and below the line do not cancel) and to penalize big misses far more than small ones.
Q. Define the best-fit line.
The line that makes the sum of squared residuals as small as possible across all possible lines. This is what “least squares” means.
Q. What does a slope of 2 in y = 3 + 2x tell you?
For every one-unit increase in x, the predicted y rises by 2. The slope is the relationship’s direction and strength per unit.
Q. What does the intercept represent?
The predicted output when every input is zero. Sometimes meaningful, sometimes just where the line crosses the axis.
Q. How does multiple regression extend the idea?
Each feature gets its own coefficient; the model minimizes the same sum of squared residuals. Each coefficient is the change in output per one-unit change in that feature, holding the others fixed.
Q. What is R-squared, in one line?
The fraction of the variation in the data that the line explains, on a scale from 0 to 1. Higher means the line captures more of the story.
Q. Why is linear regression called interpretable?
Its coefficients can be read in plain language (so much change in output per unit of input), unlike the opaque weights of larger models.
Q. Name one pitfall of linear regression.
Any of: extrapolating beyond the data range, forcing a line onto a curved relationship, reading a coefficient as a cause, or letting an outlier drag the line (squared residuals are outlier-sensitive).