从小到小计算精确的平均值 [英] Numpy to weak to calculate a precise mean value

查看:104
本文介绍了从小到小计算精确的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题与这篇文章非常相似-但不完全

.csv文件中有一些数据.数据的精度为第4位(#.####).

I have some data in a .csv file. The data has precision to the 4th digit (#.####).

在Excel或SAS中计算平均值可得到精确到第5位数字(#.#####)的结果,但使用numpy可得出:

Calculating the mean in Excel or SAS gives a result with precision to 5th digit (#.#####) but using numpy gives:

import numpy as np
data = np.recfromcsv(path2file, delimiter=';', names=['measurements'], dtype=np.float64)
rawD = data['measurements']
print np.average(rawD)

给出这样的数字

#.##### 999999999994

#.#####999999999994

显然有什么问题..

使用

from math import fsum
print fsum(rawD.ravel())/rawD.size

给予

#.#####

np.average中是否有我设置错误的_______?

Is there anything in the np.average that I set wrong _______?

奖金信息:

我只处理数组中的200个数据点

I'm only working with 200 data points in the array

我认为我应该弄清楚我的情况.

I thought I should make my case more clear.

我的csv中有类似4.2730的数字(给出4的十进制精度-即使4th始终为零[不属于主题,所以不要介意])

I have numbers like 4.2730 in my csv (giving a 4 decimal precision - even though the 4th always is zero [not part of the subject so don't mind that])

通过numpy计算平均值/平均值给了我

Calculating an average/mean by numpy gives me this

4.2516499999999994

哪个由

>>>print "%.4f" % np.average(rawD)
4.2516

在Excel或SAS中执行相同操作时,会得到以下提示:

During the same thing in Excel or SAS gives me this:

4.2517

我实际上相信这是真实的平均值,因为它发现它是4.25165. 此代码也对此进行了说明:

Which I actually believe as being the true average value because it finds it to be 4.25165. This code also illustrate it:

answer = 0
for number in rawD:
    answer += int(number*1000)
print answer/2
425165

那我怎么告诉np.average()计算这个值___?

So how do I tell np.average() to calculate this value ___?

我对numpy对我这样做感到有些惊讶...我以为我只需要担心是否要处理16位数字即可.没想到小数点后四位会受到此影响.

I'm a bit surprised that numpy did this to me... I thought that I only needed to worry if I was dealing with 16 digits numbers. Didn't expect a round off on the 4 decimal place would be influenced by this..

我知道我可以使用

fsum(rawD.ravel())/rawD.size

但是我还有其他一些东西(例如std),我想以相同的精度进行计算

But I also have other things (like std) I want to calculate with the same precision

我认为我可以通过

>>>print "%.4f" % np.float64("%.5f" % np.mean(rawD))
4.2416

没有解决此案.然后我尝试了

Which did not solve the case. Then I tried

>>>print "%.4f" % float("4.24165")
4.2416

啊!格式化程序中存在一个错误:问题5118

AHA! There is a bug in the formatter: Issue 5118

说实话,我不在乎python是否将4.24165存储为4.241649999 ...这仍然是一个舍入错误-没什么.

To be honest I don't care if python stores 4.24165 as 4.241649999... It's still a round off error - NO MATTER WHAT.

如果交际方能弄清楚数字的显示方式

If the interpeter can figure out how to display the number

>>>print float("4.24165")
4.24165

那么格式化程序也应该如此,并在四舍五入时处理该数字.

Then should the formatter as well and deal with that number when rounding..

它仍然没有改变我存在一个四舍五入问题的事实(现在同时存在格式化程序和numpy)

It still doesn't change the fact that I have a round off problem (now both with the formatter and numpy)

如果您需要一些数字来帮助我,那么我已经制作了此修改后的.csv文件:

In case you need some numbers to help me out then I have made this modified .csv file:

从此处下载

(我知道此文件没有我之前解释的数字位数,并且平均数末尾为..9988而不是..9994-它已被修改)

(I'm aware that this file does not have the number of digits I explained earlier and that the average gives ..9988 at the end instead of ..9994 - it's modified)

猜猜我的问题归结为如何获得一个字符串输出,就像我使用=average()

Guess my qeustion boils down to how do I get a string output like the one excel gives me if I use =average()

如果我选择仅显示4位数字,则将其正确舍入

and have it round off correctly if I choose to show only 4 digits

我知道这对于某些人来说可能看起来很奇怪..但是我有想重现Excel行为的原因.

I know that this might seem strange for some.. But I have my reasons for wanting to reproduce the behavior of Excel.

任何帮助将不胜感激,谢谢.

Any help would be appreciated, thank you.

推荐答案

要获取确切的十进制数字,您需要使用十进制算术而不是二进制. Python为此提供了十进制模块.

To get exact decimal numbers, you need to use decimal arithmetic instead of binary. Python provides the decimal module for this.

如果您想继续使用numpy进行计算并仅对结果取整,您仍然可以使用decimal进行此操作.您分两步进行操作,四舍五入到大量数字以消除累积的误差,然后四舍五入到所需的精度. quantize方法用于四舍五入.

If you want to continue to use numpy for the calculations and simply round the result, you can still do this with decimal. You do it in two steps, rounding to a large number of digits to eliminate the accumulated error, then rounding to the desired precision. The quantize method is used for rounding.

from decimal import Decimal,ROUND_HALF_UP
ten_places = Decimal('0.0000000001')
four_places = Decimal('0.0001')
mean = 4.2516499999999994
print Decimal(mean).quantize(ten_places).quantize(four_places, rounding=ROUND_HALF_UP)
4.2517

这篇关于从小到小计算精确的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆