python numpy中的长数组(大于2000万个元素)求和 [英] Long (>20million element) array summation in python numpy

查看:464
本文介绍了python numpy中的长数组(大于2000万个元素)求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python和numpy的新手,所以如果这个问题太简单,请原谅! 我有一个负值数组(已排序):

I am new to python and numpy so please excuse me if this problem is so rudimentary! I have an array of negative values (it is sorted):

>>>neg
[ -1.53507843e+02  -1.53200012e+02  -1.43161987e+02 ...,  -6.37326136e-1 -3.97518490e-10  -3.73480691e-10]
>>>neg.shape
(12922508,)

我需要将此数组添加到它的重复数组中(但带有正值),以找到平均为零的分布的标准偏差.因此,我执行以下操作:

I need to add this array to its duplicate (but with positive values) to find the standard deviation of the distribution averaged to zero. So I do the following:

>>>pos=-1*neg
>>>pos=pos[::-1] #Just to make it look symmetric for the display bellow!
>>>total=np.hstack((neg,pos))
>>>total
[-153.50784302 -153.20001221 -143.1619873  ...,  143.1619873   153.20001221  153.50784302]
>>>total.shape
(25845016,)

到目前为止,一切都很好,但是奇怪的是,这个新数组的总和不为零:

So far everything is very good, but the strange thing is that the sum of this new array is not zero:

>>>numpy.sum(total)
11610.6

标准偏差也根本不符合我的预期,但我想问题的根源与此相同:为什么总和不为零?

The standard deviation is also not at all near what I was expecting but I guess the root of that problem is the same as this: Why doesn't the sum result in zero?

当我将这种方法应用于一个小的数组时;例如[-5,-3,-2],总和为零.所以我想问题出在数组的长度上(超过2000万个元素).有什么办法可以解决这个问题?

When I apply this method to a small array; for example [-5, -3, -2] the sum becomes zero. So I guess the problem lies in the length of the array (over 20million elements). Is there any way to deal with this problem?

如果有人可以帮助我,我将不胜感激.

If any one could help me on this I would be most grateful.

推荐答案

如注释中所述,通过对数百万个等号的数字求和,会得出浮点舍入问题.解决此问题的一种可能方法是在组合数组中混合正数和负数,以使任何中间结果在求和时始终始终保持在相同的数量级内:

As noted in the comments, you get float roundoff problems from summing up many millions of equal-signed numbers. One possible way around this could be to mix positive and negative numbers in the combined array, so that any intermediate results while summing up always stay roughly within the same order of magnitude:

neg = -100*numpy.random.rand(20e6)
pos = -neg
combined = numpy.zeros(len(neg)+len(pos))
combined[::2] = neg
combined[1::2] = pos

现在combined.sum()应该非常接近于零.

Now combined.sum() should be pretty close to zero.

也许这种方法还将有助于提高标准偏差的计算精度.

Maybe this approach will also help to improve the precision in the computation of the standard deviation.

这篇关于python numpy中的长数组(大于2000万个元素)求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆