SVM的自定义内核,何时应用它们? [英] Custom kernels for SVM, when to apply them?

查看:68
本文介绍了SVM的自定义内核,何时应用它们?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是机器学习领域的新手,现在试图了解最常见的学习算法是如何工作的,并了解何时应用它们中的每一种.目前,我正在学习支持向量机的工作方式,并对自定义内核功能有疑问.
Web上有很多关于SVM的标准(线性,RBF,多项式)内核的信息.但是,我想了解何时选择自定义内核功能是合理的.我的问题是:

I am new to machine learning field and right now trying to get a grasp of how the most common learning algorithms work and understand when to apply each one of them. At the moment I am learning on how Support Vector Machines work and have a question on custom kernel functions.
There is plenty of information on the web on more standard (linear, RBF, polynomial) kernels for SVMs. I, however, would like to understand when it is reasonable to go for a custom kernel function. My questions are:

1)SVM还可以使用哪些其他内核?
2)在哪种情况下会应用自定义内核?
3)定制内核能否大大提高SVM的预测质量?

1) What are other possible kernels for SVMs?
2) In which situation one would apply custom kernels?
3) Can custom kernel substantially improve prediction quality of SVM?

推荐答案

1)SVM还可以使用哪些其他内核?

1) What are other possible kernels for SVMs?

其中有无数种,例如查看在pykernels中实现的列表(远不是详尽无遗的)

There are infinitely many of these, see for example list of ones implemented in pykernels (which is far from being exhaustive)

https://github.com/gmum/pykernels

  • 线性
  • 多项式
  • RBF
  • 余弦相似度
  • 指数
  • 拉普拉斯语
  • 理性二次方
  • 逆二次方
  • 可爱
  • T-学生
  • 方差分析
  • 加成Chi ^ 2
  • Chi ^ 2
  • MinMax
  • 最小/直方图交点
  • 广义直方图交点
  • 样条
  • Sorensen
  • Tanimoto
  • 小波
  • 傅里叶
  • 日志(CPD)
  • 电源(CPD)
  • Linear
  • Polynomial
  • RBF
  • Cosine similarity
  • Exponential
  • Laplacian
  • Rational quadratic
  • Inverse multiquadratic
  • Cauchy
  • T-Student
  • ANOVA
  • Additive Chi^2
  • Chi^2
  • MinMax
  • Min/Histogram intersection
  • Generalized histogram intersection
  • Spline
  • Sorensen
  • Tanimoto
  • Wavelet
  • Fourier
  • Log (CPD)
  • Power (CPD)

2)在哪种情况下会应用自定义内核?

2) In which situation one would apply custom kernels?

基本上有两种情况:

  • 简单"的结果很糟糕
  • 数据在某种意义上是特定的,因此-为了应用传统内核,必须对其进行退化.例如,如果您的数据是图形格式,则您不能应用RBF内核,因为图形不是恒定大小的向量,因此您需要一个图形内核来与该对象一起使用,而无需某种形式的信息丢失投影.有时您还可以深入了解数据,了解一些基础结构,这可能有助于分类.这样的例子就是周期性,您知道您的数据中有一种恢复作用-那么可能值得寻找特定的内核等.

3)定制内核能否大大提高SVM的预测质量?

3) Can custom kernel substantially improve prediction quality of SVM?

是的,尤其是始终存在一个(假设的)贝叶斯最优核,定义为:

Yes, in particular there always exists a (hypothethical) Bayesian optimal kernel, defined as:

K(x, y) = 1 iff arg max_l P(l|x) == arg max_l P(l|y)

换句话说,如果将标签l的真实概率P(l | x)分配给点x,那么我们可以创建一个内核,该内核几乎将您的数据点映射到它们的单点编码最可能的标签,从而导致贝叶斯最佳分类(因为它将获得贝叶斯风险).

in other words, if one has a true probability P(l|x) of label l being assigned to a point x, then we can create a kernel, which pretty much maps your data points onto one-hot encodings of their most probable labels, thus leading to Bayes optimal classification (as it will obtain Bayes risk).

实际上,要获得这样的内核当然是不可能的,因为这意味着您已经解决了问题.但是,它表明存在最佳内核"的概念,并且显然没有经典内核属于这种类型(除非您的数据来自简单的兽医分布).此外,每个内核都是一种先于决策的功能-通过诱导函数系列您可以更接近实际的决策功能-更有可能通过SVM获得合理的分类器.

In practise it is of course impossible to get such kernel, as it means that you already solved your problem. However, it shows that there is a notion of "optimal kernel", and obviously none of the classical ones is not of this type (unless your data comes from veeeery simple distributions). Furthermore, each kernel is a kind of prior over decision functions - closer you get to the actual one with your induced family of functions - the more probable is to get a reasonable classifier with SVM.

这篇关于SVM的自定义内核,何时应用它们?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆