添加/组合标准偏差 [英] Adding/Combining Standard Deviations

查看:212
本文介绍了添加/组合标准偏差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简短版本:

是否可以添加/组合StdDev?如果StdDev(11,14,16,17)= X和StdDev(21,34,43,12)= X,则

  Y 
我们可以从X& B计算StdDev(11,14,16,17,21,34,43,12) Y

长版本

我设计星型模式。模式有一个fact_table(grain = transaction),它存储单个事务response_time。该模式还有一个aggregate_table(grain = day),它存储每天的response_time_sum。

在我的报告中,我需要计算给定时间尺度(例如天,周,月等)的响应时间的标准偏差。如何计算StandardDeviation使用aggregate_table而不是触摸巨大的fact_table?

解决方案

是的,你可以结合它们。您需要知道每天的观测值数量,平均值和标准偏差。方差比标准差更容易处理,所以我将用方差表示一切。 (标准偏差定义为方差的平方根。)



表示:

 code> n [i]第i天的观察值i 
m [i]第i天的平均值
v [i]第i天的方差
/ pre>

您需要计算观察总数 N 和整体平均 M 。这很容易:

  days = [day1,day2,...,day_final] 
N = sum [i] for i in days)
M = sum(n [i] * m [i] for i in days)/ N

总方差 V 更复杂,但仍可计算:

  s1 = sum(n [i] * v [i] for i in days)
s2 = sum(n [i] *(m [i] )** 2 for i in days)
V =(s1 + s2)/ N

上面是针对人口方差的。如果你改为 v [i] 作为样本方差,对 s1 V

  s1_sample = sum((n [i] -1)* v [i] for i in days)
V_sample =(s1_sample + s2)/(N-1)


Short Version:
Can StdDevs be added/combined? i.e.

if StdDev(11,14,16,17)=X and StdDev(21,34,43,12)=Y  
can we calculate StdDev(11,14,16,17,21,34,43,12) from X & Y

Long Version:
I am designing a star schema. The schema has a fact_table (grain=transaction) which stores individual transaction response_time. The schema also has an aggregate_table (grain=day) which stores the response_time_sum per day.
In my report I need to calculate standard deviations of the response time for a given timedimension, say day, week, month etc. How can I calculate the StandardDeviation using the aggregate_table instead of touching the huge fact_table?

解决方案

Yes, you can combine them. You need to know the number of observations, mean, and standard deviation for each day. The variance is easier to work with than the standard deviation, so I'll express everything else in terms of variance. (Standard deviation is defined as the square root of the variance.)

Denote:

n[i] # observations for day i
m[i] # mean for day i
v[i] # variance for day i

You'll need to calculate the total number of observations N and the overall mean M. This is easy:

days = [day1, day2, ..., day_final]
N = sum(n[i] for i in days)
M = sum(n[i] * m[i] for i in days) / N

The overall variance V is more complicated, but still can be calculated:

s1 = sum(n[i] * v[i] for i in days)
s2 = sum(n[i] * (m[i] - M)**2 for i in days)
V = (s1 + s2) / N

The above are for the population variance. If you instead have v[i] as the sample variance, some minor modifications to s1 and V are needed:

s1_sample = sum((n[i] - 1) * v[i] for i in days)
V_sample = (s1_sample + s2) / (N - 1)

这篇关于添加/组合标准偏差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆