计算流中的标准偏差 [英] Computing Standard Deviation in a stream

查看:100
本文介绍了计算流中的标准偏差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Python,假设我正在运行已知数量的项目I,并且能够计时处理每个t所需的时间,以及运行所花费的时间总计T和到目前为止已处理的项目数c.我目前正在计算运行中的平均值A = T / c,但是如果说单个项目花费的时间特别长(几秒钟而不是几毫秒),则可能会造成偏差.

Using Python, assume I'm running through a known quantity of items I, and have the ability to time how long it takes to process each one t, as well as a running total of time spent processing T and the number of items processed so far c. I'm currently calculating the average on the fly A = T / c but this can be skewed by say a single item taking an extraordinarily long time to process (a few seconds compared to a few milliseconds).

我想展示一个运行中的标准偏差.在不保存每个t记录的情况下该如何做?

I would like to show a running Standard Deviation. How can I do this without keeping a record of each t?

推荐答案

我使用 John D. Cook的概述.这是其中的一段,总结了为什么它是首选方法:

I use Welford's Method, which gives more accurate results. This link points to John D. Cook's overview. Here's a paragraph from it that summarizes why it is a preferred approach:

这种更好的方差计算方法可以追溯到1962年B. P. Welford的论文,并在Donald Knuth的Art of Computer Programming,第2卷,第232页,第三版中进行了介绍.尽管这种解决方案已有数十年的知名度,但是对此却知之甚少.大多数人可能直到第一个计算标准差并获得负数平方根的例外情况时才意识到计算样本方差可能很困难.

This better way of computing variance goes back to a 1962 paper by B. P. Welford and is presented in Donald Knuth’s Art of Computer Programming, Vol 2, page 232, 3rd edition. Although this solution has been known for decades, not enough people know about it. Most people are probably unaware that computing sample variance can be difficult until the first time they compute a standard deviation and get an exception for taking the square root of a negative number.

这篇关于计算流中的标准偏差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆