Cheatsheet: Drawing the widest margin: support vector machines
The core idea
Section titled “The core idea”| Term | Meaning |
|---|---|
| Boundary chosen | the maximum-margin one (middle of the widest street) |
| Margin | width of the street; distance from boundary to nearest points |
| Support vectors | the nearest points, on the street edges; they alone set the boundary |
| Memory efficiency | only support vectors matter; the rest can be dropped |
Soft margin
Section titled “Soft margin”| Setting | Effect |
|---|---|
| Allow more violations | wider street, more forgiving, generalizes better |
| Allow fewer violations | narrow street, fits training data tightly, risks overfit |
| The dial | parameter C; this is the bias-variance tradeoff |
The kernel trick
Section titled “The kernel trick”| Step | What happens |
|---|---|
| Problem | classes not separable by a straight line |
| Lift | map data into a higher dimension via a kernel |
| Separate | a flat boundary works in the higher dimension |
| Fold back | that boundary is curved in the original space |
| Shortcut | a kernel function computes relationships without the coordinates |
| Common kernels | polynomial, radial basis function (RBF) |
Lifting example (squaring a 1D feature)
Section titled “Lifting example (squaring a 1D feature)”| Class | Original (x) | Squared (x^2) |
|---|---|---|
| IN | -2,-1,0,1,2 | 0 to 4 |
| OUT | -4,-3,3,4 | 9 to 16 |
| Separator | none on x | threshold ~6.5 on x^2 |
Strengths, weaknesses, gotcha
Section titled “Strengths, weaknesses, gotcha”| Strengths | Weaknesses |
|---|---|
| Effective in high dimensions | Slow on very large datasets |
| Memory-efficient (support vectors only) | Sensitive to kernel/parameter choice |
| Non-linear via kernels | No native probabilities; less interpretable |
| Gotcha: distance-based, so you MUST scale features first |