计算标准差的在线算法 [英] Online algorithm for calculating standard deviation

查看:76
本文介绍了计算标准差的在线算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常,我有一个更技术性的问题,但我会用一个数球的例子为你简化它.

Normally, I have a more technical problem but I will simplify it for you with an example of counting balls.

假设我有不同颜色的球,并且为每种颜色保留了一个数组索引(初始化为全 0).每次我选择一个球,我都会将相应的索引增加 1.

Assume I have balls of different colors and one index of an array (initialized to all 0's) reserved for each color. Every time I pick a ball, I increment the corresponding index by 1.

球是随机挑选的,我一次只能挑选一个球.我唯一的目的是计算每种颜色的球数,直到我用完球.

Balls are picked randomly and I can only pick one ball at a time. My sole purpose is to count number of balls for every color, until I run out of balls.

我想计算不同颜色的球数量的标准偏差,在我计算它们的同时.我不想在计算完所有球后不得不再次遍历数组来计算它.

I would like to calculate standard deviation of the number of balls of different colors, while I am counting them. I do not want to calculate it by having to iterate through the array once more after I am done with counting all the balls.

形象化:

随机排列的球:BBGRRYYBBGGGGGGB(每个字母代表一种颜色的第一个字母)从 0 到 3 的数组索引分别对应于颜色 B、G、R 和 Y.当我选择完球后,我的数组看起来像 [5,7,2,2].

Balls in random order: BBGRRYYBBGGGGGGB (each letter represents first letter of a color) Array indices from 0 to 3 correspond to colors B, G, R and Y respectively. When I am done picking the balls, my array looks like [5,7,2,2].

在获得最终数组后计算标准偏差非常简单,但我想在填充这个数组时进行.

It is very simple to calculate standard deviation after having the final array but I want to do it while I am filling this array.

我想用 Java 来做,我有大约 1000 种颜色.

I want to do it in Java and I have approximately 1000 colors.

最有效的实现方式是什么?或者在拿到最终数组之前有没有办法做到这一点?

What is the most efficient way to implement that? Or is there even a way to do it before having the final array in hand?

推荐答案

由于平均值和标准偏差是使用总和计算的,因此您可以轻松地为这些实现合适的累加器.然后,当您需要实际值时,完成其余的计算(特别是除法).

Since average and standard deviation are calculated using sums, you can easily implement appropriate accumulators for these. Then when you want the actual values, finish the rest of the calculation (particularly, the divisions).

平方和是棘手的部分,因为您要为每个输入增加一个频率.解决此问题的一种方法是维护迄今为止看到的每种颜色的计数(使用适当的数据结构).然后,当您在输入中看到一种颜色时,您可以减去它之前的方块并重新添加新的方块(或等效地将两个方块的差值添加到您的累加器中).

The sum of squares is the tricky part since you increment one of the frequencies for each input. One way to deal with this is to maintain a count of each color seen so far (using an appropriate data structure). Then when you see a color in the input, you can subtract out its previous square and add the new square back in (or equivalently add the difference of the two squares to your accumulator).

我将把它留给读者来实现这里描述的算法.

I'll leave it to the reader to implement the algorithm described here.

这篇关于计算标准差的在线算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆