scikits.learn.svm
Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outlayer detection.
The advantages of Support Vector Machines are:
- Effective in high dimensional spaces.
- Still effective in cases where number of dimensions is greater than the number of samples.
- Uses a subset of training points in the decission function (called support vectors), so it is also memory efficient.
- Versatile: different
svm_kernels
can be specified for the decission function. Common kernels are provided, but it is also possibly to specify custom kernels.
The dissadvantes of Support Vector Machines include:
- If the number of features is much greater than the number of samples, the method is likely to give poor performance.
- SVMs do not directly provide probability estimates, so these must be calculated using indirect techniques. In our case, these techniques imply conducting five-fold cross-validation, so performance can suffer. See method predict_proba for more information.
Suppose some given data points each belong to one of N classes, and the goal is to decide which class a new data point will be in. The classes that permform this task are SVC
, NuSVC
and LinearSVC
.
SVC
and NuSVC
are similar methods, but accept slightly different set of parameters and have different mathematical formulations (see section svm_mathematical_formulation
). On the other hand, LinearSVC
is another implementation of SVC optimized in the case of a linear kernel. Note that LinearSVC
does not accept keyword 'kernel', as this is assumed to be linear. It also lacks some of the memebrs of SVC and NuSVC, like support_.
As other classifiers, SVC and NuSVC have to be fitted with two arrays: an array X of size [m_samples, n_features] holding the training samples, and an array Y of size [n_samples] holding the target values (class labels) for the training samples:
>>> from scikits.learn import svm
>>> X = [[0., 0.], [1., 1.]]
>>> Y = [0, 1]
>>> clf = svm.SVC()
>>> clf.fit (X, Y)
SVC(kernel='rbf', C=1.0, probability=False, degree=3, coef0=0.0, eps=0.001,
cache_size=100.0,
shrinking=True,
gamma=0.5)
After being fitted, the model can then be used to predict new values:
>>> clf.predict ([[2., 2.]])
array([ 1.])
example_svm_plot_iris.py
, example_svm_plot_separating_hyperplane.py
, example_svm_plot_svm_anova.py
, example_svm_plot_svm_nonlinear.py
The method of Support Vector Classification can be extended to solve regression problems. This method is called Support Vector Regression.
The model produced by support vector classification (as described above) depends only on a subset of the training data, because the cost function for building the model does not care about training points that lie beyond the margin. Analogously, the model produced by Support Vector Regression depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction.
There are two flavours of Support Vector Regression: SVR
and NuSVR
.
Like in the class of classification, the fit method will take as argument vectors X, y, only that in this case y is expected to have floating point values instead of integer values.
example_svm_plot_svm_regression.py
One-class SVM is used for outliers detection, that is, given a set of samples, it will detect the soft boundary of that set so as to classify new points as belonging to that set or not. The class that implement this is called OneClassSVM
In this case, as it is a type of unsupervised learning, the fit method will only take as input an array X, as there are no class labels.
example_svm_plot_oneclass.py
scikits.learn.sparse.svm
There is support for sparse data given in any matrix in a format supported by scipy.sparse. Classes have the same name, just prefixed by the sparse namespace, and take the same arguments, with the exception of training and test data, which is expected to be in a matrix format defined in scipy.sparse.
For maximum efficiency, use the CSR matrix format as defined in scipy.sparse.csr_matrix.
See the complete listing of classes in sparse_svm_class_reference
.
- Support Vector Machine algorithms are not scale invariant, so it is highly recommended to scale your data. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standarize it to have mean 0 and variance 1. Note that the same scaling must be applied to the test vector to obtain meaningful results. See The CookBook for some examples on scaling.
- Parameter nu in NuSVC/OneClassSVM/NuSVR approximates the fraction of training errors and support vectors.
- If data for classification are unbalanced (e.g. many positive and few negative), try different penalty parameters C.
- Specify larger cache size (keyworkd cache) for huge problems.
The kernel function can be any of the following:
- linear: < xi, xj′>.
- polynomial: (γ < x, x′ > + r)d. d is specified by keyword degree.
- rbf (exp( − γ|x − x′|2), γ > 0). γ is specified by keyword gamma.
- sigmoid (tanh( < xi, xj > + r)).
Different kernels are specified by keword kernel at initialization:
>>> linear_svc = svm.SVC(kernel='linear')
>>> linear_svc.kernel
'linear'
>>> rbf_svc = svm.SVC (kernel='rbf')
>>> rbf_svc.kernel
'rbf'
You can define your own kernels by either giving the kernel as a python function or by precomputing the Gram matrix.
Classifiers with custom kernels behave the same way as any other classifiers, except that:
- Support vectors do no longer represent the vectors, but rather are indices of the support vectors for the training vectors.
- A reference (and not a copy) of the first argument in the fit() method is stored for future reference. If that array changes between the use of fit() and predict() you will have unexpected results.
You can also use your own defined kernels by passing a function to the keyword kernel in the constructor.
Your kernel must take as arguments two matrices and return a third matrix.
The following code defines a linear kernel and creates a classifier instance that will use that kernel:
>>> import numpy as np
>>> from scikits.learn import svm
>>> def my_kernel(x, y):
... return np.dot(x, y.T)
...
>>> clf = svm.SVC(kernel=my_kernel)
Set kernel='precomputed' and pass the gram matrix instead of X in the fit method.
example_svm_plot_custom_kernel.py
.
A support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training datapoints of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier.
Given training vectors xi ∈ Rn, i=1,..., l, in two classes, and a vector y ∈ Rl such that yi ∈ 1, − 1, SVC solves the following primal problem:
Its dual is
where e is the vector of all ones, C > 0 is the upper bound, Q is an l by l positive semidefinite matrix, Qij ≡ K(xi, xj) and ϕ(xi)T phi(x) is the kernel. Here training vectors are mapped into a higher (maybe infinite) dimensional space by the function ϕ
The decision function is:
This parameters can accessed through the memebers support_ and intercept_:
- Member support_ holds the product yTα
- Member intercept_ of the classifier holds − ρ
This algorithm is implemented as described in Automatic Capacity Tuning of Very Large VC-dimension Classifiers and Support-vector networks
We introduce a new parameter ν wich controls the number of support vectors and training errors. The parameter ν ∈ (0, 1] is an upper bound on the fraction of training errors and a lower bound of the fraction of support vectors.
Internally, we usel libsvm to handle all computations. Libsvm is wrapped using C and Cython.
For a description of the implementation and details of the algorithms used, please refer to