特征建议:sum()应该使用补偿求和算法 [英] Feature suggestion: sum() ought to use a compensated summation algorithm

查看:117
本文介绍了特征建议:sum()应该使用补偿求和算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我做了以下计算:生成一个在0和1之间的一百万随机

数字的列表,通过减去平均值构建一个新列表

每个数字的值,然后再计算平均值。


结果应为0,但当然它会略微不同于

因为舍入错误。


但是,我注意到下面简单的Python程序给出了结果

~10 ^ -14,而等价的Mathematica程序(也使用双倍的

精度)给出~10 ^ -17的结果,即三个数量级

更精确。


这是程序(原谅我的风格,我是新手/临时用户):


随机随机导入


data = [random()for x in xrange(1000000)]


mean = sum(data)/ len(data)

print sum( x - 数据中x的平均值)/ len(数据)


一点研究表明,数学ica使用补偿求和
算法。实际上,使用
中描述的算法 http://en.wikipedia.org / wiki / Kahan_summation_algorithm

给我们一个大约10 ^ -17的结果:


def compSum(arr):

s = 0.0

c = 0.0

for x in arr:

y = xc

t = s + y

c =(ts) - y

s = t

返回s


mean = compSum(数据)/ len(数据)

打印compSum(x - 数据中x的平均值)/ len(数据)

我认为如果构建它会非常好-in sum()函数默认使用

此算法。这是在之前提出来的吗?这个

是否有任何缺点(除了轻微的性能影响,但

Python无论如何都是一种高级语言......)?

SzabolcsHorvát


I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here''s the program (pardon my style, I''m a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

Szabolcs Horvát

推荐答案

SzabolcsHorvát< sz ****** @ gmail.comwrites:


[...]
Szabolcs Horvát <sz******@gmail.comwrites:

[...]

一项小小的研究表明,Mathematica使用了补偿

总和。算法。实际上,使用
中描述的算法 http://en.wikipedia.org / wiki / Kahan_summation_algorithm

给我们一个大约10 ^ -17的结果:


def compSum(arr):

s = 0.0

c = 0.0

for x in arr:

y = xc

t = s + y

c =(ts) - y

s = t

返回s


mean = compSum(数据)/ len(数据)

打印compSum(x - 数据中x的平均值)/ len(数据)


我认为它会非常很好,如果内置的sum()函数

默认使用此算法。这是否已经提出过了?

这有什么不利之处(除了轻微的表现

影响,但Python无论如何都是一种高级语言......)?


SzabolcsHorvát
A little research shows that Mathematica uses a "compensated
summation" algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function
used this algorithm by default. Has this been brought up before?
Would this have any disadvantages (apart from a slight performance
impact, but Python is a high-level language anyway ...)?

Szabolcs Horvát



sum()适用于任何具有__add__方法的对象序列,而不是

只是漂浮!你的算法特定于花车。


-

Arnaud

sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

--
Arnaud


5月3日星期六2008 18:50:34 +0200,Szabolcs Horv ?? t写道:
On Sat, 03 May 2008 18:50:34 +0200, Szabolcs Horv??t wrote:

我做了以下计算:生成一个百万随机的列表

在0和1之间的数字,通过从每个数字中减去平均值

值构建一个新列表,然后再次计算平均值。


结果应为0,但由于舍入错误,它当然会略微不同于0




但是,我注意到下面的简单Python程序给出了结果

~10 ^ -14,而等效的Mathematica程序(也使用双精度
精度)给出~10 ^ -17的结果,即三阶幅度

更精确。


这是程序(原谅我的风格,我是新手/临时用户):
<来自随机随机随机的



data = [random()for x in xrange(1000000)]


mean = sum(data)/ len(data)

print sum( x - 数据中x的平均值)/ len(数据)

一项小小的研究表明,Mathematica使用补偿求和
算法。实际上,使用
中描述的算法 http://en.wikipedia.org / wiki / Kahan_summation_algorithm 给我们一个结果

左右~10 ^ -17:

def compSum(arr):

s = 0.0

c = 0.0

for x in arr:

y = xc

t = s + y

c =(ts) - y

s = t

返回s


mean = compSum(数据)/ len(数据)

打印compSum(x - 数据中x的平均值)/ len(数据)


我认为它会非常很好,如果内置的sum()函数默认使用这个算法
。这是在之前提出来的吗?这个

是否有任何缺点(除了轻微的性能影响,但

Python无论如何都是一种高级语言......)?

Szabolcs Horv ?? t
I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here''s the program (pardon my style, I''m a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm gives us a result
around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

Szabolcs Horv??t



内置总和应该适用于所有东西,而不仅仅是花车。我认为

对于标准数学模块是有用的补充。


如果你知道C你可以写一个补丁到mathmodule.c并提交给

Python开发者。


-

Ivan

Built-in sum should work with everything, not just floats. I think it
would be useful addition to standard math module.

If you know C you could write a patch to mathmodule.c and submit it to
Python devs.

--
Ivan


SzabolcsHorvát < sz ****** @ gmail.comwrote:
Szabolcs Horvát <sz******@gmail.comwrote:

我做了以下计算:生成一个百万随机的列表

在0和1之间的数字,通过从每个数字中减去平均值

值构建一个新的列表,然后再次计算平均值。


结果应该是是0,但当然因为舍入误差而略微不同于0




但是,我注意到下面的简单Python程序给出了结果

~10 ^ -14,而等效的Mathematica程序(也使用双精度
精度)给出~10 ^ -17的结果,即三个数量级

更精确。


这是'程序(原谅我的风格,我是新手/临时用户):


随机随机导入


data = [random()for x in xrange(1000000)]


mean = sum(data)/ len(data)

打印总和(x - 数据中x的平均值)/ len(数据)

一项小小的研究表明,Mathematica使用补偿求和
算法。实际上,使用
中描述的算法 http://en.wikipedia.org / wiki / Kahan_summation_algorithm

给我们一个大约~10 ^ -17的结果:
I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here''s the program (pardon my style, I''m a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:



我的数学方法教我当然(大约20年前现在!)

以最精确的方式做到这一点的方法是将

最小幅度加到最大幅度。


例如

I was taught in my numerical methods course (about 20 years ago now!)
that the way to do this sum with most accuracy is to sum from the
smallest magnitude to the largest magnitude.

Eg


>>来自随机导入随机
data = [random()for x in xrange(1000000)]
mean = sum(data)/ len(data)
print sum(x - 数据中x的平均值)/ len (数据)
>>from random import random
data = [random() for x in xrange(1000000)]
mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)



1.64152134108e-14

1.64152134108e-14


>> mean = sum(sorted(da) ta))/ len(数据)
打印和(x - 数据中x的平均值)/ len(数据)
>>mean = sum(sorted(data))/len(data)
print sum(x - mean for x in data)/len(data)



5.86725534824e-15

5.86725534824e-15


>>>
>>>



它不如补偿金额好但很容易!

It isn''t as good as the compensated sum but it is easy!


我认为如果内置的sum()函数默认使用这个算法,那将是非常好的。这是在之前提出来的吗?这会是否有任何不利之处(除了轻微的性能影响,但

Python无论如何都是一种高级语言......)?
I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?



sum()用于任何数字类型而不仅仅是花车...


-

Nick Craig-Wood< ni ** @ craig-wood.com-- http://www.craig-wood.com/nick


这篇关于特征建议:sum()应该使用补偿求和算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆