Cheatsheet: Fitting a line: linear regression
The line
Section titled “The line”| Piece | Symbol | Meaning |
|---|---|---|
| Equation | y = b + m*x | prediction = intercept + slope times feature |
| Slope | m | change in prediction per one-unit increase in input |
| Intercept | b | prediction when the input is zero |
| Parameters | m, b | the model’s weights; training chooses them |
Least squares (definition of best fit)
Section titled “Least squares (definition of best fit)”| Step | What you do |
|---|---|
| 1 | Residual = actual minus predicted, for each point |
| 2 | Square each residual (errors stay positive; big misses cost more) |
| 3 | Add them up: the sum of squared residuals (SSR) |
| 4 | Best-fit line = the one with the smallest possible SSR |
Worked SSR comparison
Section titled “Worked SSR comparison”| Line | Per-point squared residuals | SSR | Verdict |
|---|---|---|---|
| A: y = 1.5x + 0.5 | 0.00, 0.25, 0.00 | 0.25 | better fit |
| B: y = 2x | 0, 0, 1 | 1.00 | worse fit |
(Data points: (1,2), (2,4), (3,5). Lower SSR wins.)
Reading coefficients
Section titled “Reading coefficients”| Coefficient | Reads as |
|---|---|
| Slope = 0.30 | each +1 of input predicts +0.30 of output |
| Intercept = 200 | predicted output is 200 when input is 0 |
| Negative slope | output falls as input rises |
Multiple regression
Section titled “Multiple regression”| Form | |
|---|---|
| Equation | y = b + m1*x1 + m2*x2 + m3*x3 |
| Each coefficient | change in output per one-unit change in that feature, others held fixed |
| Goal | same: minimize the sum of squared residuals |
Fit quality and pitfalls
Section titled “Fit quality and pitfalls”| Idea | Note |
|---|---|
| R-squared | fraction of variation explained, 0 to 1; higher is more |
| Extrapolation | unreliable outside the data’s range |
| Curved data | a straight line fits a curve poorly |
| Coefficient as cause | slope is association, not causation |
| Outliers | squaring makes least squares outlier-sensitive |