支持向量机内核类型 [英] Support Vector Machine kernel types

查看:88
本文介绍了支持向量机内核类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

支持向量机中使用的流行核函数是线性,径向基函数和多项式.有人可以用简单的方式解释一下该内核功能是什么吗:)由于我是该领域的新手,所以我不清楚这些内核类型的重要性.

解决方案

让我们从一开始就开始.支持向量机是一个线性模型,它总是在寻找一个超平面来将一类与另一类分开.我将关注二维情况,因为它更容易理解,并且-可以可视化以给出一些直觉,但是请记住,这对于更高的维度是正确的(简单地将线变为平面,将抛物线变为抛物面等). /p>

内核很短

内核要做的是更改线性公式中点积的定义.这是什么意思? SVM与点积一起使用,定义为<x,y> = x^Ty = SUM_{i=1}^d x_i y_i的有限尺寸.这或多或少地捕获了两个向量之间的相似性(但也是投影的几何运算,它也与向量之间的角度密切相关).内核技巧是将SVM数学中每次出现的<x,y>更改为K(x,y),说"K是某些空间中的点积",并且每个内核都存在一个映射f_K,因此K(x,y)=<f_K(x), f_K(y)>技巧是,您无需直接使用f_K,而只需计算其点积,即可节省大量时间(有时-无限量,因为f_K(x)可能具有无限数量的尺寸).好的,这对我们有什么意义?我们仍然生活"在x的空间中,而不是f_K(x).结果非常好-如果您在f_K空间中构建一个超平面,分离数据,然后回头看一下x的空间(因此您可能会说您通过f_K ^ {-1}将超平面投影回去),则会得到非线性决定界限!边界的类型取决于f_K,f_K取决于K,因此,K的选择(除其他事项外)将影响边界的形状.

线性内核

在这里,我们实际上没有任何内核,您只有普通"的点积,因此在2d中,您的决策边界始终是直线.

如您所见,我们可以正确分离大多数点,但是由于假设的刚度",我们将永远无法捕获所有点.

Poly

在这里,我们的核在一定程度上诱发了我们特征的多项式组合的空间.因此,我们可以处理稍微弯曲"的决策边界,例如度数为2的抛物线

如您所见-我们分开了更多的积分!好的,我们可以使用高阶多项式得到所有这些吗?让我们尝试4!

不幸的是没有.为什么?因为多项式组合不够灵活.它不会足够努力地弯曲"我们的空间以捕获我们想要的东西(也许还不错吗?我的意思是-看这一点,它看起来像是一个离群值!).

RBF内核

在这里,我们的诱导空间是一个高斯分布的空间...每个点都成为正态分布的概率密度函数(直至缩放).在这样的空间中,点积是整数(因为我们确实有无限多个维!),因此,我们具有极大的灵活性,实际上,使用这样的内核,您可以分离所有内容(但这很好吗?)

粗略比较

好的,主要区别是什么?现在,我将以很少的措施对这三个内核进行排序

    SVM学习时间:线性<聚< rbf
  • 适合任何数据的能力:线性<聚< rbf
  • 过度拟合的风险:线性<聚< rbf
  • 拟合不足的风险:rbf<聚<线性
  • 超参数的数量:线性(0)< rbf(2)<多边形(3)
  • 局部"如何是特定的内核:线性<聚< rbf

那么选择哪一个呢?这取决于. Vapnik和Cortes(SVM的发明者)很好地支持了这样的想法,即您应该始终尝试拟合最简单的模型,并且仅当它不适合时-选择更复杂的模型.因此,您通常应该从线性模型开始(如果是SVM,则为内核),如果分数真的很差-切换到poly/rbf(但是请记住,由于超参数的数量,使用它们很难得多)

在libSVM站点上使用漂亮的applet完成的所有图像-试试看,没有什么比许多图像和交互更直观了:-) https://www.csie.ntu.edu.tw/~cjlin/libsvm/

Popular kernel functions used in Support Vector Machines are Linear, Radial Basis Function and Polynomial. Can someone please expalin what this kernel function is in simple way :) As I am new to this area I don't clear understand what is the importance of these kernel types.

解决方案

Let us start from the beggining. Support vector machine is a linear model and it always looks for a hyperplane to separate one class from another. I will focus on two-dimensional case because it is easier to comprehend and - possible to visualize to give some intuition, however bear in mind that this is true for higher dimensions (simply lines change into planes, parabolas into paraboloids etc.).

Kernel in very short words

What kernels do is to change the definition of the dot product in the linear formulation. What does it mean? SVM works with dot products, for finite dimension defined as <x,y> = x^Ty = SUM_{i=1}^d x_i y_i. This more or less captures similarity between two vectors (but also a geometrical operation of projection, it is also heavily related to the angle between vectors). What kernel trick does is to change each occurence of <x,y> in math of SVM into K(x,y) saying "K is dot product in SOME space", and there exists a mapping f_K for each kernel, such that K(x,y)=<f_K(x), f_K(y)> the trick is, you do not use f_K directly, but just compute their dot products, which saves you tons of time (sometimes - infinite amount, as f_K(x) might have infinite number of dimensions). Ok, so what it meas for us? We still "live" in the space of x, not f_K(x). The result is quite nice - if you build a hyperplane in space of f_K, separate your data, and then look back at space of x (so you might say you project hyperplane back through f_K^{-1}) you get non-linear decision boundaries! Type of the boundary depends on f_K, f_K depends on K, thus, choice of K will (among other things) affect the shape of your boundary.

Linear kernel

Here we in fact do not have any kernel, you just have "normal" dot product, thus in 2d your decision boundary is always line.

As you can see we can separate most of points correctly, but due to the "stiffness" of our assumption - we will not ever capture all of them.

Poly

Here, our kernel induces space of polynomial combinations of our features, up to certain degree. Consequently we can work with slightly "bended" decision boundaries, such as parabolas with degree=2

As you can see - we separated even more points! Ok, can we get all of them by using higher order polynomials? Lets try 4!

Unfortunately not. Why? Because polynomial combinations are not flexible enough. It will not "bend" our space hard enough to capture what we want (maybe it is not that bad? I mean - look at this point, it looks like an outlier!).

RBF kernel

Here, our induced space is a space of Gaussian distributions... each point becomes probability density function (up to scaling) of a normal distribution. In such space, dot products are integrals (as we do have infinite number of dimensions!) and consequently, we have extreme flexibility, in fact, using such kernel you can separate everything (but is it good?)

Rough comparison

Ok, so what are the main differences? I will now sort these three kernels under few measures

  • time of SVM learning: linear < poly < rbf
  • ability to fit any data: linear < poly < rbf
  • risk of overfitting: linear < poly < rbf
  • risk of underfitting: rbf < poly < linear
  • number of hyperparameters: linear (0) < rbf (2) < poly (3)
  • how "local" is particular kernel: linear < poly < rbf

So which one to choose? It depends. Vapnik and Cortes (inventors of SVM) supported quite well the idea that you always should try to fit simpliest model possible and only if it underfits - go for more complex ones. So you should generally start with linear model (kernel in case of SVM) and if it gets really bad scores - switch to poly/rbf (however remember that it is much harder to work with them due to number of hyperparameters)

All images done using a nice applet on the site of libSVM - give it a try, nothing gives you more intuition then lots of images and interaction :-) https://www.csie.ntu.edu.tw/~cjlin/libsvm/

这篇关于支持向量机内核类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆