单次生成器上的计算统计信息. Python [英] Computing stats on generators in single pass. Python
问题描述
使用发电机时,您只能单次拉出物品.另一种选择是将生成器加载到列表中并执行多次,但这会影响性能和内存分配.
When working with generators you can only pull out items on a single pass. An alternative is to load the generator into an list and do multiple passes but this involves a hit on performance and memory allocation.
任何人都可以想到一种更好的方法,可以通过一次生成器来计算生成器的以下指标.理想情况下,代码可以计算计数,总和,平均值,标准差,最大值,最小值以及您可以想到的任何其他统计信息.
Can anyone think of a better way of computing the following metrics from a generator in a single pass. Ideally the code computes the count, sum, average, sd, max, min and any other stats you can think of.
更新
此要点中的初始恐怖代码. 在此处查看要点: https://gist.github.com/3038746
Initial horrid code in this gist. See the gist here: https://gist.github.com/3038746
在这里,使用@larsmans的出色建议是我最终的解决方案.使用命名的元组确实有帮助.
Using the great suggestions from @larsmans here is the final solution I went with. Using the named tuple really helped.
import random
from math import sqrt
from collections import namedtuple
def stat(gen):
"""Returns the namedtuple Stat as below."""
Stat = namedtuple('Stat', 'total, sum, avg, sd, max, min')
it = iter(gen)
x0 = next(it)
mx = mn = s = x0
s2 = x0*x0
n = 1
for x in it:
mx = max(mx, x)
mn = min(mn, x)
s += x
s2 += x*x
n += 1
return Stat(n, s, s/n, sqrt(s2/n - s*s/n/n), mx, mn)
def random_int_list(size=100, start=0, end=1000):
return (random.randrange(start,end,1) for x in xrange(size))
if __name__ == '__main__':
r = stat(random_int_list())
print r #Stat(total=100, sum=56295, avg=562, sd=294.82537204250247, max=994, min=10)
推荐答案
def statistics(it):
"""Returns number of elements, sum, max, min"""
it = iter(it)
x0 = next(it)
maximum = minimum = total = x0
n = 1
for x in it:
maximum = max(maximum, x)
minimum = min(minimum, x)
total += x
n += 1
return n, total, maximum, minimum
根据需要添加其他统计信息.当要计算的统计信息数量增加时,请考虑使用namedtuple
.
Add other statistics as you please. Consider using a namedtuple
when the number of statistics to compute grows large.
如果您真的想花哨的话,可以构建一个面向统计信息收集器(未经测试)的OO层次结构:
If you want to get really fancy, you can build an OO hierarchy of statistics collectors (untested):
class Summer(object):
def __init__(self, x0=0):
self.value = x0
def add(self, x):
self.value += x
class SquareSummer(Summer):
def add(self, x):
super(SquareSummer, self).add(x ** 2)
class Maxer(object):
def __init__(self, x0):
self.value = x0
def add(self, x):
self.value = max(self.value, x)
# example usage: collect([Maxer, Summer], iterable)
def collect(collectors, it):
it = iter(it)
x0 = next(it)
collectors = [c(x0) for c in collectors]
for x in it:
for c in collectors:
c.add(x)
return [c.value for c in collectors]
这篇关于单次生成器上的计算统计信息. Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!