References: Fitting a line: linear regression

Source material

Source material (conceptual spine):
• StatQuest with Josh Starmer:
  "Linear Models Part 0: Fitting a line to data, aka Least Squares, aka Linear Regression"
  Creator: Josh Starmer
  YouTube: https://www.youtube.com/watch?v=PaFPbb66DxQ
  Channel / site: https://statquest.org/
  License: as published on StatQuest's public YouTube channel (link-out only)

Source material (hands-on companion):
• Microsoft: "ML For Beginners" (Regression module)
  Repository: https://github.com/microsoft/ML-For-Beginners
  License: MIT

Clawdemy provides original notes, summaries, and quizzes derived from this material
for educational purposes. All rights to the original videos and curriculum remain
with their creators.

What this lesson draws from each source

StatQuest’s least-squares video anchors the core definition of best fit (minimize the sum of squared residuals) and the intuition for why residuals are squared. The worked numeric comparison of two candidate lines is Clawdemy’s own, built to make “lower SSR wins” concrete.
Microsoft’s ML-For-Beginners Regression module is the hands-on companion: it builds and fits regression models in Python with scikit-learn, which is the natural next step once the concept is clear.

The framing of the slope and intercept as a model’s “parameters” (and the bridge to neural-network weights) is Clawdemy’s own connective tissue across the track.

Going deeper

StatQuest with Josh Starmer. Beyond the least-squares video, StatQuest has dedicated explainers for R-squared, p-values for regression, and multiple regression. If any piece of this lesson felt fast, the matching StatQuest video slows it down with pictures.
Microsoft ML-For-Beginners: Regression. Project-based regression lessons in scikit-learn. Where this lesson keeps you at the level of intuition, this is where you fit a real model to real data.

Adjacent topics

Gradient descent (the next lesson). This lesson defined the best-fit line but did not show how to find it in general. Gradient descent is the search procedure that does, and it underlies far more than regression.
Logistic regression (lesson 4). The same “fit a line” machinery, bent to predict a probability instead of a number, which turns regression into a classifier.
Regularization (ridge and lasso). Techniques that adjust least squares to avoid overfitting when there are many features. This track folds the idea into the bias-variance lesson in Phase 4 rather than giving it a standalone lesson.

Community discussion

None selected for this lesson. The standard introductions to linear regression are well covered by the StatQuest and Microsoft resources above. If a canonical discussion surfaces, it will be added at the next review.