From a line to a probability: logistic regression
What you’ll learn
Section titled “What you’ll learn”This is lesson 4 of Track 10, the opener of Phase 2 (Teaching a machine to decide). By the end you will be able to explain how logistic regression turns a set of inputs into a probability between 0 and 1, and pinpoint where its decision boundary sits. The one capability to walk away with: given a fitted logistic model, compute the probability it assigns, read off the yes/no decision, and say exactly where the line between the two classes falls.
The track structurally mirrors StatQuest’s intuition-first machine learning videos, with Microsoft’s “ML For Beginners” as the hands-on companion for readers who want to build the models in code. Full attribution is in this lesson’s references.
Where this fits
Section titled “Where this fits”Phase 1 was about predicting numbers and the machinery of learning. This phase turns to classification, predicting categories, and logistic regression is the bridge: it is classification built directly on the linear regression you already know, plus one new idea (the squash). It is also the first model we fit using gradient descent, so it cashes in lesson 3. From here the phase moves to approaches that do not draw a single straight boundary at all, starting with decision trees in the next lesson.
Before you start
Section titled “Before you start”Prerequisites: Lesson 2, Fitting a line: linear regression (logistic regression reuses the weighted-sum line at its core), and lesson 3, How models actually learn: gradient descent (it is how logistic regression is fit). No calculus required; the sigmoid is presented as an S-shaped squashing curve, not a derivation.
By the end, you’ll be able to
Section titled “By the end, you’ll be able to”- Explain why a straight line fails for a yes/no label
- Describe how the sigmoid turns a linear combination into a probability
- Locate the decision boundary (probability 0.5, where the linear part is zero)
- Compute a probability and a decision from a fitted model by hand
- Explain that it is fit by gradient descent and that the threshold is movable
Time and difficulty
Section titled “Time and difficulty”- Read time: about 12 minutes
- Practice time: about 15 minutes (a probability-and-boundary computation, a threshold-tradeoff question, and flashcards)
- Difficulty: standard