MATLAB 中的统计异常值检测 [英] Statistical outlier detection in MATLAB

查看:273
本文介绍了MATLAB 中的统计异常值检测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有这个矩阵:

main = [10000   5   3   1;
5   5677    0   134;
1   1   456 3];

此方法是计量经济学和统计问题中使用最广泛的方法.X 是我们在其中搜索异常值的数据.

This method the most widely used method in econometrics and statistical problems.X is our data that we're searching for outliers in it.

X-mean(X)>= n*std(X)

因此,如果此不等式成立,则该样本为异常值,否则我们将保留该样本.

So If this Inequality was true, That sample is outlier otherwise We will keep the sample.

现在我的问题.我想用这些代码找出异常值:

Now my question. I want find outliers with these codes:

meann = mean(main);
stdd = std(main);
out = find(main-repmat(meann,size(main,1),1)>=repmat(2*stdd,size(main,1),1));

我们正在每一列中搜索异常值.Out 应该表示异常值的索引.在最后一步中,我们应该删除每一列中的异常值.在 MAtLAB 中是否有更简单的函数或方法可以做到这一点?

We are searching outliers in every column. Out should indicate index of outliers. In final step We should remove outliers in every column. Is any simpler function or method to do this in MAtLAB?

谢谢.

推荐答案

如果你想在每列的基础上找到与平均值相差 2 个标准差,我会使用 bsxfun 而不是 repmat 像这样:

If you want to find 2 standard deviations away from the mean on a per column basis I would use bsxfun rather than repmat like this:

meann = mean(main)
stdd = std(main)

I = bsxfun(@gt, abs(bsxfun(@minus, main, meann)), 2*stdd)

我会在 I 停下来,因为这将允许您删除异常值.但是你可以调用 find 你喜欢它:

I would stop at I as this will allow you to remove outliers. However you can call find it you like:

out = find(I)

虽然对我来说这样做更直观:

Although to me it is more intuitive to do this:

I = bsxfun(@lt, meann + 2*stdd, main) | bsxfun(@gt, meann - 2*stdd, main)

我认为您的 repmat 解决方案缺少 abs btw

I think your repmat solution is missing an abs btw

这篇关于MATLAB 中的统计异常值检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆