Skip to content

Latest commit

 

History

History
361 lines (236 loc) · 10.6 KB

svm.rst

File metadata and controls

361 lines (236 loc) · 10.6 KB

Support Vector Machines

scikits.learn.svm

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outlayer detection.

The advantages of Support Vector Machines are:

  • Effective in high dimensional spaces.
  • Still effective in cases where number of dimensions is greater than the number of samples.
  • Uses a subset of training points in the decission function (called support vectors), so it is also memory efficient.
  • Versatile: different svm_kernels can be specified for the decission function. Common kernels are provided, but it is also possibly to specify custom kernels.

The dissadvantes of Support Vector Machines include:

  • If the number of features is much greater than the number of samples, the method is likely to give poor performance.
  • SVMs do not directly provide probability estimates, so these must be calculated using indirect techniques. In our case, these techniques imply conducting five-fold cross-validation, so performance can suffer. See method predict_proba for more information.

Classification

Suppose some given data points each belong to one of N classes, and the goal is to decide which class a new data point will be in. The classes that permform this task are SVC, NuSVC and LinearSVC.

SVC and NuSVC are similar methods, but accept slightly different set of parameters and have different mathematical formulations (see section svm_mathematical_formulation). On the other hand, LinearSVC is another implementation of SVC optimized in the case of a linear kernel. Note that LinearSVC does not accept keyword 'kernel', as this is assumed to be linear. It also lacks some of the memebrs of SVC and NuSVC, like support_.

As other classifiers, SVC and NuSVC have to be fitted with two arrays: an array X of size [m_samples, n_features] holding the training samples, and an array Y of size [n_samples] holding the target values (class labels) for the training samples:

>>> from scikits.learn import svm
>>> X = [[0., 0.], [1., 1.]]
>>> Y = [0, 1]
>>> clf = svm.SVC()
>>> clf.fit (X, Y)
SVC(kernel='rbf', C=1.0, probability=False, degree=3, coef0=0.0, eps=0.001,
  cache_size=100.0,
  shrinking=True,
  gamma=0.5)

After being fitted, the model can then be used to predict new values:

>>> clf.predict ([[2., 2.]])
array([ 1.])

Examples

example_svm_plot_iris.py, example_svm_plot_separating_hyperplane.py, example_svm_plot_svm_anova.py, example_svm_plot_svm_nonlinear.py

Regression

The method of Support Vector Classification can be extended to solve regression problems. This method is called Support Vector Regression.

The model produced by support vector classification (as described above) depends only on a subset of the training data, because the cost function for building the model does not care about training points that lie beyond the margin. Analogously, the model produced by Support Vector Regression depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction.

There are two flavours of Support Vector Regression: SVR and NuSVR.

Like in the class of classification, the fit method will take as argument vectors X, y, only that in this case y is expected to have floating point values instead of integer values.

Examples

example_svm_plot_svm_regression.py

Density estimation

One-class SVM is used for outliers detection, that is, given a set of samples, it will detect the soft boundary of that set so as to classify new points as belonging to that set or not. The class that implement this is called OneClassSVM

In this case, as it is a type of unsupervised learning, the fit method will only take as input an array X, as there are no class labels.

Examples

example_svm_plot_oneclass.py

scikits.learn.sparse.svm

Support Vector machines for sparse data

There is support for sparse data given in any matrix in a format supported by scipy.sparse. Classes have the same name, just prefixed by the sparse namespace, and take the same arguments, with the exception of training and test data, which is expected to be in a matrix format defined in scipy.sparse.

For maximum efficiency, use the CSR matrix format as defined in scipy.sparse.csr_matrix.

See the complete listing of classes in sparse_svm_class_reference.

Tips on Practical Use

  • Support Vector Machine algorithms are not scale invariant, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standarize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results. See The CookBook for some examples on scaling.
  • Parameter nu in NuSVC/OneClassSVM/NuSVR approximates the fraction of training errors and support vectors.
  • If data for classification are unbalanced (e.g. many positive and few negative), try different penalty parameters C.
  • Specify larger cache size (keyworkd cache) for huge problems.

Kernel functions

The kernel function can be any of the following:

  • linear:  < xi, xj′>.
  • polynomial: (γ < x, x′ >  + r)d. d is specified by keyword degree.
  • rbf (exp( − γ|x − x′|2), γ > 0). γ is specified by keyword gamma.
  • sigmoid (tanh( < xi, xj >  + r)).

Different kernels are specified by keword kernel at initialization:

>>> linear_svc = svm.SVC(kernel='linear')
>>> linear_svc.kernel
'linear'
>>> rbf_svc = svm.SVC (kernel='rbf')
>>> rbf_svc.kernel
'rbf'

Custom Kernels

You can define your own kernels by either giving the kernel as a python function or by precomputing the Gram matrix.

Classifiers with custom kernels behave the same way as any other classifiers, except that:

  • Support vectors do no longer represent the vectors, but rather are indices of the support vectors for the training vectors.
  • A reference (and not a copy) of the first argument in the fit() method is stored for future reference. If that array changes between the use of fit() and predict() you will have unexpected results.

Using python functions as kernels

You can also use your own defined kernels by passing a function to the keyword kernel in the constructor.

Your kernel must take as arguments two matrices and return a third matrix.

The following code defines a linear kernel and creates a classifier instance that will use that kernel:

>>> import numpy as np
>>> from scikits.learn import svm
>>> def my_kernel(x, y):
...     return np.dot(x, y.T)
... 
>>> clf = svm.SVC(kernel=my_kernel)

Passing the gram matrix

Set kernel='precomputed' and pass the gram matrix instead of X in the fit method.

Examples

example_svm_plot_custom_kernel.py.

Mathematical formulation

A support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training datapoints of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier.

SVC

Given training vectors xi ∈ Rn, i=1,..., l, in two classes, and a vector y ∈ Rl such that yi ∈ 1,  − 1, SVC solves the following primal problem:

$$\min_ {w, b, \zeta} \frac{1}{2} w^T w + C \sum_{i=1, l} \zeta_i$$$$\begin{aligned} \textrm {subject to } & y_i (w^T \phi (x_i) + b) \geq 1 - \zeta_i,\\\ & \zeta_i \geq 0, i=1, ..., l \end{aligned}$$

Its dual is

$$\min_{\alpha} \frac{1}{2} \alpha^T Q \alpha - e^T \alpha$$$$\begin{aligned} \textrm {subject to } & y^T \alpha = 0\\\ & 0 \leq \alpha_i \leq C, i=1, ..., l \end{aligned}$$

where e is the vector of all ones, C > 0 is the upper bound, Q is an l by l positive semidefinite matrix, Qij ≡ K(xi, xj) and ϕ(xi)T phi(x) is the kernel. Here training vectors are mapped into a higher (maybe infinite) dimensional space by the function ϕ

The decision function is:

$$sgn(\sum_{i=1}^l y_i \alpha_i K(x_i, x) + \rho)$$

This parameters can accessed through the memebers support_ and intercept_:

  • Member support_ holds the product yTα
  • Member intercept_ of the classifier holds  − ρ

References

This algorithm is implemented as described in Automatic Capacity Tuning of Very Large VC-dimension Classifiers and Support-vector networks

NuSVC

We introduce a new parameter ν wich controls the number of support vectors and training errors. The parameter ν ∈ (0, 1] is an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors.

Implementation details

Internally, we usel libsvm to handle all computations. Libsvm is wrapped using C and Cython.

References

For a description of the implementation and details of the algorithms used, please refer to