比较NumPy arange和自定义范围函数以产生具有十进制增量的范围 [英] Comparing NumPy arange and custom range function for producing ranges with decimal increments

查看:90
本文介绍了比较NumPy arange和自定义范围函数以产生具有十进制增量的范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个自定义函数,允许逐步执行十进制增量:

def my_range(start, stop, step):
    i = start
    while i < stop:
        yield i
        i += step

它是这样的:

out = list(my_range(0, 1, 0.1))
print(out)

[0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999]

现在,这不足为奇了.可以理解,这是由于浮点数不准确而导致的,并且0.1在内存中没有确切的表示形式.因此,这些精度误差是可以理解的.

另一方面,取numpy:

import numpy as np

out = np.arange(0, 1, 0.1)
print(out)
array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9]) 

有趣的是,这里没有引入明显的不精确度.我认为这可能与__repr__显示的内容有关,因此为了确认,我尝试了此操作:

x = list(my_range(0, 1.1, 0.1))[-1]
print(x.is_integer())

False

x = list(np.arange(0, 1.1, 0.1))[-1]
print(x.is_integer())

True

因此,我的函数返回了一个不正确的上限值(应该为1.0,但实际上为1.0999999999999999),但是np.arange正确地执行了此操作.

我知道浮点数学运算是否坏了?这个问题是:

numpy如何做到这一点?

解决方案

端点上的差异是因为NumPy会预先计算长度而不是临时计算长度,因为NumPy需要预先分配数组.您可以在 _calc_length中看到此内容帮手.它不会在到达结束参数时停止,而会在到达预定长度时停止.

预先计算长度并不能使您避免非整数步骤的问题,并且无论如何,您经常会遇到错误的"端点,例如,使用 numpy.linspace ,而不是步长,而是要多少个元素以及是否要包括正确的端点.


在计算元素时,NumPy似乎没有舍入错误,但这仅仅是由于不同的显示逻辑. NumPy比float.__repr__更积极地截断显示的精度.如果使用tolist获取普通的Python标量的普通列表(以及普通的float显示逻辑),则可以看到NumPy也遇到了舍入错误:

In [47]: numpy.arange(0, 1, 0.1).tolist()
Out[47]: 
[0.0,
 0.1,
 0.2,
 0.30000000000000004,
 0.4,
 0.5,
 0.6000000000000001,
 0.7000000000000001,
 0.8,
 0.9]

它的舍入误差略有不同,例如,在.6和.7中而不是.8和.9中,因为它还使用了不同的方法来计算元素,在 fill函数有关的dtype.

fill函数实现的优点在于,它使用start + i*step而不是重复添加步骤,从而避免了每次添加时都会积累错误.但是,它的缺点是(出于令人信服的原因,我无法看到)它从前两个元素重新计算步长,而不是将步长作为参数,因此它可能在前一步中失去很多精度. /p>

Here's a custom function that allows stepping through decimal increments:

def my_range(start, stop, step):
    i = start
    while i < stop:
        yield i
        i += step

It works like this:

out = list(my_range(0, 1, 0.1))
print(out)

[0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999]

Now, there's nothing surprising about this. It's understandable this happens because of floating point inaccuracies and that 0.1 has no exact representation in memory. So, those precision errors are understandable.

Take numpy on the other hand:

import numpy as np

out = np.arange(0, 1, 0.1)
print(out)
array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9]) 

What's interesting is that there are no visible imprecision accuracies introduced here. I thought this might have to do with what the __repr__ shows, so to confirm, I tried this:

x = list(my_range(0, 1.1, 0.1))[-1]
print(x.is_integer())

False

x = list(np.arange(0, 1.1, 0.1))[-1]
print(x.is_integer())

True

So, my function returns an incorrect upper value (it should be 1.0 but it is actually 1.0999999999999999), but np.arange does it correctly.

I'm aware of Is floating point math broken? but the point of this question is:

How does numpy do this?

解决方案

The difference in endpoints is because NumPy calculates the length up front instead of ad hoc, because it needs to preallocate the array. You can see this in the _calc_length helper. Instead of stopping when it hits the end argument, it stops when it hits the predetermined length.

Calculating the length up front doesn't save you from the problems of a non-integer step, and you'll frequently get the "wrong" endpoint anyway, for example, with numpy.arange(0.0, 2.1, 0.3):

In [46]: numpy.arange(0.0, 2.1, 0.3)
Out[46]: array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8,  2.1])

It's much safer to use numpy.linspace, where instead of the step size, you say how many elements you want and whether you want to include the right endpoint.


It might look like NumPy has suffered no rounding error when calculating the elements, but that's just due to different display logic. NumPy is truncating the displayed precision more aggressively than float.__repr__ does. If you use tolist to get an ordinary list of ordinary Python scalars (and thus the ordinary float display logic), you can see that NumPy has also suffered rounding error:

In [47]: numpy.arange(0, 1, 0.1).tolist()
Out[47]: 
[0.0,
 0.1,
 0.2,
 0.30000000000000004,
 0.4,
 0.5,
 0.6000000000000001,
 0.7000000000000001,
 0.8,
 0.9]

It's suffered slightly different rounding error - for example, in .6 and .7 instead of .8 and .9 - because it also uses a different means of computing the elements, implemented in the fill function for the relevant dtype.

The fill function implementation has the advantage that it uses start + i*step instead of repeatedly adding the step, which avoids accumulating error on each addition. However, it has the disadvantage that (for no compelling reason I can see) it recomputes the step from the first two elements instead of taking the step as an argument, so it can lose a great deal of precision in the step up front.

这篇关于比较NumPy arange和自定义范围函数以产生具有十进制增量的范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆