Skip to content

Cheatsheet: Drawing the widest margin: support vector machines

TermMeaning
Boundary chosenthe maximum-margin one (middle of the widest street)
Marginwidth of the street; distance from boundary to nearest points
Support vectorsthe nearest points, on the street edges; they alone set the boundary
Memory efficiencyonly support vectors matter; the rest can be dropped
SettingEffect
Allow more violationswider street, more forgiving, generalizes better
Allow fewer violationsnarrow street, fits training data tightly, risks overfit
The dialparameter C; this is the bias-variance tradeoff
StepWhat happens
Problemclasses not separable by a straight line
Liftmap data into a higher dimension via a kernel
Separatea flat boundary works in the higher dimension
Fold backthat boundary is curved in the original space
Shortcuta kernel function computes relationships without the coordinates
Common kernelspolynomial, radial basis function (RBF)
ClassOriginal (x)Squared (x^2)
IN-2,-1,0,1,20 to 4
OUT-4,-3,3,49 to 16
Separatornone on xthreshold ~6.5 on x^2
StrengthsWeaknesses
Effective in high dimensionsSlow on very large datasets
Memory-efficient (support vectors only)Sensitive to kernel/parameter choice
Non-linear via kernelsNo native probabilities; less interpretable
Gotcha: distance-based, so you MUST scale features first