Practice: From a line to a probability: logistic regression

Self-check

Seven short questions. Try to answer each one before opening the collapsible.

1. Why does a straight line fail when you try to predict a yes/no label directly?

Show answer

A line is unbounded, so it predicts values above 1 and below 0, which are nonsense as probabilities. It is also the wrong shape for yes/no data and gets dragged around by extreme points. We need an output bounded between 0 and 1.

2. What does the sigmoid function do?

Show answer

It squashes any number into the range 0 to 1 along an S-shaped curve. Very negative inputs map near 0, very positive inputs near 1, and an input of 0 maps to exactly 0.5.

3. How does logistic regression produce its probability, in two steps?

Show answer

Step one: compute the same weighted sum of inputs as linear regression, z = intercept + coefficient * feature. Step two: pass z through the sigmoid to turn it into a probability between 0 and 1.

4. Where is the decision boundary, in terms of the probability and in terms of z?

Show answer

At a probability of 0.5, which is exactly where z = 0 (the linear part equals zero). That set of points is the straight line or surface separating the predicted “yes” region from the predicted “no” region.

5. How is logistic regression fit, and why not least squares?

Show answer

By gradient descent, minimizing a loss built for probabilities (it punishes confident-wrong predictions hard). There is no tidy closed-form formula as there is for a straight line, so it is searched for, exactly the downhill walk from lesson 3.

6. What does a positive coefficient mean in logistic regression?

Show answer

As that feature increases, z increases, which pushes the predicted probability of “yes” up. A negative coefficient pushes it down. The size measures strength.

7. Why is the name “logistic regression” misleading?

Show answer

Because it is a classifier, not a method for predicting a number. The “regression” is historical. Its output is a probability of belonging to a class, which you turn into a yes/no decision with a threshold.

Try it yourself: probability, decision, and boundary

A spam model uses the number of links in an email: z = -1 + (0.5 * links), and probability of spam = sigmoid(z). Use these sigmoid values: sigmoid(-1) ~ 0.27, sigmoid(0) = 0.50, sigmoid(1) ~ 0.73. For 0, 2, and 4 links, find z, the probability of spam, and the yes/no decision at a 0.5 threshold. Then state the decision boundary (the number of links where the model is exactly on the fence).

Show answer

links = 0  ->  z = -1 + 0   = -1  ->  prob ~ 0.27  ->  NOT spam
links = 2  ->  z = -1 + 1   =  0  ->  prob = 0.50  ->  on the fence
links = 4  ->  z = -1 + 2   =  1  ->  prob ~ 0.73  ->  SPAM

The decision boundary is at 2 links, because that is where z = 0 and the probability is exactly 0.50. Below 2 links the model leans not-spam, above it leans spam. Notice you did not actually need the sigmoid values to make the decision: a probability is at least 0.5 exactly when z is at least 0, so the sign of z already tells you the call.

Try it yourself: move the threshold

A fraud model outputs a probability of 0.40 for a transaction. With the default 0.5 threshold it predicts “not fraud” and lets it through. Missing real fraud is far more costly than a false alarm. What can you change, and what is the cost of changing it?

Show answer

Lower the decision threshold, say to 0.3, so more transactions get flagged as fraud (this one, at 0.40, now would be). That catches more real fraud. The cost is more false alarms: legitimate transactions wrongly flagged, which annoys customers and creates review work. The threshold is a dial between “catch more fraud” and “raise fewer false alarms,” and 0.5 is just the default position. The evaluation phase makes this tradeoff precise with precision and recall.

Flashcards

Ten cards. Click any card to reveal the answer. Use the Print flashcards button for one card per page.

Q. Why can't a straight line predict a yes/no label well?

It is unbounded (predicts values above 1 and below 0, nonsense as probabilities) and the wrong shape for yes/no data. We need an output bounded between 0 and 1.

Q. What does the sigmoid function do?

It squashes any number into the range 0 to 1 on an S-curve: very negative maps near 0, very positive near 1, and 0 maps to exactly 0.5.

Q. How does logistic regression produce a probability?

Compute the same weighted sum as linear regression (z = intercept + coefficient times feature), then pass z through the sigmoid to get a probability.

Q. Where is the logistic regression decision boundary?

At probability 0.5, which is exactly where z = 0 (the linear part equals zero). It is a straight line or flat surface separating the two predicted classes.

Q. How is logistic regression fit?

By gradient descent, minimizing a probability-suited loss that punishes confident-wrong predictions. There is no tidy least-squares formula.

Q. What does a positive coefficient mean?

As the feature increases, z increases and the predicted probability of “yes” goes up. A negative coefficient pushes it down; size measures strength.

Q. Why is the name 'logistic regression' misleading?

It is a classifier, not a number-predictor. The output is a class probability, turned into a yes/no decision by a threshold. The “regression” is historical.

Q. Is 0.5 always the right decision threshold?

No. It is the default. When errors have unequal costs or classes are imbalanced, the right cutoff is often higher or lower than 0.5.

Q. What is the shape of a logistic regression's boundary?

Straight (a line or flat surface). If the true boundary curves, logistic regression needs engineered features or a different model.

Q. Where does logistic regression hide inside neural networks?

The final layer of many classifiers squashes scores into probabilities with the sigmoid (or its multi-class cousin, the softmax). A “confidence” percentage usually comes from this.