在 SQL Server 中通过标准偏差消除异常值 [英] Eliminating outliers by standard deviation in SQL Server
问题描述
我正在尝试通过标准偏差消除 SQL Server 2008 中的异常值.我只想要包含特定列中值在该列平均值的 +/- 1 标准偏差范围内的记录.
I am trying to eliminate outliers in SQL Server 2008 by standard deviation. I would like only records that contain a value in a specific column within +/- 1 standard deviation of that column's mean.
我怎样才能做到这一点?
How can I accomplish this?
推荐答案
如果您假设事件的钟形曲线分布,那么只有 68% 的值与平均值相差 1 个标准差(95% 被覆盖2 个标准差).
If you are assuming a bell curve distribution of events, then only 68% of values will be within 1 standard deviation away from the mean (95% are covered by 2 standard deviations).
我会加载一个具有您范围标准偏差的变量(使用 stdev/stdevp sql 函数),然后选择适当数量的标准偏差内的值.
I would load a variable with the standard deviation of your range (derived using stdev / stdevp sql function) and then select the values that are within the appropriate number of standard deviations.
declare @stdtest table (colname varchar(20), colvalue int)
insert into @stdtest (colname, colvalue) values ('a', 2)
insert into @stdtest (colname, colvalue) values ('b', 4)
insert into @stdtest (colname, colvalue) values ('c', 4)
insert into @stdtest (colname, colvalue) values ('d', 4)
insert into @stdtest (colname, colvalue) values ('e', 5)
insert into @stdtest (colname, colvalue) values ('f', 5)
insert into @stdtest (colname, colvalue) values ('g', 7)
insert into @stdtest (colname, colvalue) values ('h', 9)
declare @std decimal
declare @mean decimal
declare @lower decimal
declare @higher decimal
declare @noofstds int
select @std = STDEV(colvalue), @mean = AVG(colvalue) from @stdtest
--68%
set @noofstds = 1
select @lower = @mean - (@noofstds * @std)
select @higher = @mean + (@noofstds * @std)
select @lower, @higher, * from @stdtest where colvalue between @lower and @higher
--returns rows with a colvalue between 3 and 7 inclusive
--95%
set @noofstds = 2
select @lower = @mean - (@noofstds * @std)
select @higher = @mean + (@noofstds * @std)
select @lower, @higher, * from @stdtest where colvalue between @lower and @higher
--returns rows with a colvalue between 1 and 9 inclusive
这篇关于在 SQL Server 中通过标准偏差消除异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!