错误的numpy平均值? [英] Wrong numpy mean value?

查看:122
本文介绍了错误的numpy平均值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通常会进行大量的模拟.有时,我需要计算一组粒子的质心.我注意到在许多情况下,numpy.mean()返回的平均值是错误的.我可以弄清楚这是由于累加器饱和所致.为了避免出现此问题,我可以将总和分成小部分粒子,但是这很不舒服.有人知道如何以一种优雅的方式解决这个问题吗?

I usually work with huge simulations. Sometimes, I need to compute the center of mass of the set of particles. I noted that in many situations, the mean value returned by numpy.mean() is wrong. I can figure out that it is due to a saturation of the accumulator. In order to avoid the problem, I can split the summation over all particles in small set of particles, but it is uncomfortable. Anybody has and idea of how to solve this problem in an elegant way?

仅仅是为了激发您的好奇心,下面的示例所产生的结果类似于我在模拟中观察到的结果:

Just for piking up your curiosity, the following example produce something similar to what I observe in my simulations:

import numpy as np
a = np.ones((1024,1024), dtype=np.float32)*30504.00005

如果检查最大值和最小值,则会得到:

if you check the max and min values, you get:

a.max() 
30504.0
a.min() 
30504.0

但是,平均值是:

a.mean()
30687.236328125

您可以确定这里有问题.使用dtype = np.float64时不会发生这种情况,因此很好地解决单精度问题.

You can figure out that something is wrong here. This is not happening when using dtype=np.float64, so it should be nice to solve the problem for single precision.

推荐答案

这不是NumPy问题,而是浮点问题.在C中也是如此:

This isn't a NumPy problem, it's a floating-point issue. The same occurs in C:

float acc = 0;
for (int i = 0; i < 1024*1024; i++) {
    acc += 30504.00005f;
}
acc /= (1024*1024);
printf("%f\n", acc);  // 30687.304688

(实时演示)

问题在于浮点精度有限;随着累加器值相对于要添加到累加器中的元素的增加,相对精度会下降.

The problem is that floating-point has limited precision; as the accumulator value grows relative to the elements being added to it, the relative precision drops.

一种解决方案是通过构造加法器树来限制相对增长.这是C语言中的一个示例(我的Python不够好...):

One solution is to limit the relative growth, by constructing an adder tree. Here's an example in C (my Python isn't good enough...):

float sum(float *p, int n) {
    if (n == 1) return *p;
    for (int i = 0; i < n/2; i++) {
        p[i] += p[i+n/2];
    }
    return sum(p, n/2);
}

float x[1024*1024];
for (int i = 0; i < 1024*1024; i++) {
    x[i] = 30504.00005f;
}

float acc = sum(x, 1024*1024);

acc /= (1024*1024);
printf("%f\n", acc);   // 30504.000000

(实时演示)

这篇关于错误的numpy平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆