一类SVM概率估计以及一类SVM和聚类之间的区别是什么 [英] One class SVM probability estimates and what is the different between one class SVM and clustering

查看:267
本文介绍了一类SVM概率估计以及一类SVM和聚类之间的区别是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组图像.我想学习一个类的SVM(OC-SVM)来建模特定类(正)的分布,因为我没有足够的示例来表示其他类(负).我对OC-SVM的了解是,它试图将数据与原点分开,或者换句话说,它试图学习一个超球体以适合一类数据.

I have a set of images. I would like to learn a one class SVM (OC-SVM) to model the distribution of a particular class (positive) as I dont have enough examples to represent the other classes (negative). What I understood about OC-SVM is that it tries to separate the data from the origin or in other words it tries to learn a hyper sphere to fit the one class data.

我的问题是

  1. 如果我想将OC-SVM的输出用作概率估计,我该怎么做?

  1. If I want to use the output of the OC-SVM as a probability estimate, how can I do it?

OC-SVM与任何聚类算法(例如k均值)有什么区别?

What is the difference between the OC-SVM and any clustering algorithm (e.g. k-means)?

推荐答案

如果您要进行概率估计,请不要使用一类SVM.这不是他们设计的目的.您需要类似内核密度估计的示例,该示例提供了一些积极的例子,提供了非参数密度估计.

If you want a probability estimate, don't use a one-class SVM. This is not what they were designed for. You want something like kernel density estimation, which provides a non-parametric density estimate given some positive examples.

一类SVM和聚类之间的区别在于,在聚类中,您从多个类中获得了点,但是您不知道哪些点对应于哪些类:这是推理的目的(而且您可能还会最后给出类的密度估计以及整个特征空间上的边际密度).一类SVM仅从一类中获得分数,并且希望学习该类成员与其他任何成员之间的分离.

The difference between a one-class SVM and clustering is that in clustering, you're given points from several classes but you don't know which points correspond to which classes: this is the goal of inference (and you may also end up with density estimates for the classes and the marginal density over all of feature space too). The one-class SVM is given points only from one class, and expected to learn a separation between members of that class and anything else.

聚类与密度估计不同.聚类与确定哪些实例属于哪些类(集群),何时不给出赋值有关,并且不一定会导致所提供的示例与输入空间中的任何点之间具有相似性评分.

Clustering is not the same as density estimation. Clustering is concerned with determining which instances belong to which classes (clusters), when the assignments are not given, and does not necessarily result in a similarity score between the supplied examples and any point in input space.

如果要说的是,这个新实例与我所见过的积极训练示例有多相似,那么您要做的就是使概率分布适合您的训练示例,然后在新点评估密度函数.如果该密度低于阈值,则表示新点不在提供的示例定义的类之外.

If the goal is to say, how similar is this new instance to the positive training examples I've seen, then what you do is fit a probability distribution to your training examples, then evaluate the density function at the new point. If this density falls below a threshold, you say the new point is outside of the class defined by the supplied examples.

您可以根据需要构建类的参数模型,但这通常很棘手,除非您对问题有所了解或愿意采用标准分布(多元正态或朴素贝叶斯是两个显而易见的模型) ).因此,替代方法是使用非参数密度估计.这是我提到的内核密度估计.

You can build a parametric model of the class if you like, but this is usually tricky unless you either know something about the problem or are willing to take a standard distribution (multi-variate normal or Naive Bayes being the two obvious ones). So, the alternative is to use a non-parametric density estimate. This is the kernel density estimation I mentioned.

这篇关于一类SVM概率估计以及一类SVM和聚类之间的区别是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆