在 SQL Server 中通过标准偏差消除异常值 [英] Eliminating outliers by standard deviation in SQL Server

查看:35
本文介绍了在 SQL Server 中通过标准偏差消除异常值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过标准偏差消除 SQL Server 2008 中的异常值.我只想要包含特定列中值在该列平均值的 +/- 1 标准偏差范围内的记录.

I am trying to eliminate outliers in SQL Server 2008 by standard deviation. I would like only records that contain a value in a specific column within +/- 1 standard deviation of that column's mean.

我怎样才能做到这一点?

How can I accomplish this?

推荐答案

如果您假设事件的钟形曲线分布,那么只有 68% 的值与平均值相差 1 个标准差(95% 被覆盖2 个标准差).

If you are assuming a bell curve distribution of events, then only 68% of values will be within 1 standard deviation away from the mean (95% are covered by 2 standard deviations).

我会加载一个具有您范围标准偏差的变量(使用 stdev/stdevp sql 函数),然后选择适当数量的标准偏差内的值.

I would load a variable with the standard deviation of your range (derived using stdev / stdevp sql function) and then select the values that are within the appropriate number of standard deviations.

declare @stdtest table (colname varchar(20), colvalue int)

insert into @stdtest (colname, colvalue) values ('a', 2)
insert into @stdtest (colname, colvalue) values ('b', 4)
insert into @stdtest (colname, colvalue) values ('c', 4)
insert into @stdtest (colname, colvalue) values ('d', 4)
insert into @stdtest (colname, colvalue) values ('e', 5)
insert into @stdtest (colname, colvalue) values ('f', 5)
insert into @stdtest (colname, colvalue) values ('g', 7)
insert into @stdtest (colname, colvalue) values ('h', 9)

declare @std decimal
declare @mean decimal
declare @lower decimal
declare @higher decimal
declare @noofstds int

select @std = STDEV(colvalue), @mean = AVG(colvalue) from @stdtest

--68%
set @noofstds = 1
select @lower = @mean - (@noofstds * @std)
select @higher = @mean + (@noofstds * @std)

select @lower, @higher, * from @stdtest where colvalue between @lower and @higher

--returns rows with a colvalue between 3 and 7 inclusive

--95%
set @noofstds = 2
select @lower = @mean - (@noofstds * @std)
select @higher = @mean + (@noofstds * @std)

select @lower, @higher, * from @stdtest where colvalue between @lower and @higher

--returns rows with a colvalue between 1 and 9 inclusive

这篇关于在 SQL Server 中通过标准偏差消除异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆