基于高斯混合模型的离群值检测 [英] outlier detection based on gaussian mixture model

查看:244
本文介绍了基于高斯混合模型的离群值检测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组数据.我想根据这些数据建立一个一类的分布.基于学习到的分布,我想获取每个数据实例的概率值. 基于这个概率值(阈值),我想构建一个分类器以对来自该分布的特定数据实例进行分类.

I have a set of data. I want to build a one class distribution from that data. Based on the learned distribution I want to get a probability value for each of the data instance. Based on this probability values (thresholding) I want to build a classifier to classify a particular data instance is comming from that distribution or not.

在这种情况下,假设我有一个50x100000的数据,其中50是每个数据实例的维数,实例的数量为100000.

In this case, lets say I have a data of 50x100000 where 50 is the dimension of each data instance, the number of instances are 100000. I am leaning a Gaussian mixture model based on this distribution.

当我尝试获取实例的概率值时,我得到的值很低.那么在这种情况下,我该如何构建clssifier?

When I try to get the probability values for instances I am getting very low values. So in this case how can I build a clssifier?

推荐答案

我认为这没有道理.例如,假设您的数据是一维的,并且事实是它是从双峰分布.但是,假设您尚未确定它是来自双峰分布,并且适合正态分布.您仍然具有最佳拟合度,但这将是对错误分布的最佳拟合,而事实是,这些要点都不来自该分布或任何看起来像它的分布.

I don't think this makes sense. For example, suppose your data is 1 dimensional, and suppose the truth is that it has been sampled from a bimodal distribution. But suppose you haven't worked out that it's from a bimodal distribution and you fit a normal distribution. You'd still have the best possible fit, but it would be the best possible fit to the wrong distribution, and the truth is that none of the points come from that distribution or from any distribution that looks like it.

这篇关于基于高斯混合模型的离群值检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆