MATLAB 中的统计异常值检测 [英] Statistical outlier detection in MATLAB
问题描述
假设我们有这个矩阵:
main = [10000 5 3 1;
5 5677 0 134;
1 1 456 3];
此方法是计量经济学和统计问题中使用最广泛的方法.X
是我们在其中搜索异常值的数据.
This method the most widely used method in econometrics and statistical problems.X
is our data that we're searching for outliers in it.
X-mean(X)>= n*std(X)
因此,如果此不等式成立,则该样本为异常值,否则我们将保留该样本.
So If this Inequality was true, That sample is outlier otherwise We will keep the sample.
现在我的问题.我想用这些代码找出异常值:
Now my question. I want find outliers with these codes:
meann = mean(main);
stdd = std(main);
out = find(main-repmat(meann,size(main,1),1)>=repmat(2*stdd,size(main,1),1));
我们正在每一列中搜索异常值.Out
应该表示异常值的索引.在最后一步中,我们应该删除每一列中的异常值.在 MAtLAB 中是否有更简单的函数或方法可以做到这一点?
We are searching outliers in every column. Out
should indicate index of outliers. In final step We should remove outliers in every column. Is any simpler function or method to do this in MAtLAB?
谢谢.
推荐答案
如果你想在每列的基础上找到与平均值相差 2 个标准差,我会使用 bsxfun
而不是 repmat
像这样:
If you want to find 2 standard deviations away from the mean on a per column basis I would use bsxfun
rather than repmat
like this:
meann = mean(main)
stdd = std(main)
I = bsxfun(@gt, abs(bsxfun(@minus, main, meann)), 2*stdd)
我会在 I
停下来,因为这将允许您删除异常值.但是你可以调用 find
你喜欢它:
I would stop at I
as this will allow you to remove outliers. However you can call find
it you like:
out = find(I)
虽然对我来说这样做更直观:
Although to me it is more intuitive to do this:
I = bsxfun(@lt, meann + 2*stdd, main) | bsxfun(@gt, meann - 2*stdd, main)
我认为您的 repmat
解决方案缺少 abs
btw
I think your repmat
solution is missing an abs
btw
这篇关于MATLAB 中的统计异常值检测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!