使用int dtype进行numpy数组计算时出错(在需要时无法将dtype自动转换为64位) [英] Error with numpy array calculations using int dtype (it fails to cast dtype to 64 bit automatically when needed)

查看:722
本文介绍了使用int dtype进行numpy数组计算时出错(在需要时无法将dtype自动转换为64位)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当计算的输入是具有32位整数数据类型的numpy数组,但输出包含需要64位表示的较大数字时,我遇到了numpy计算不正确的问题.

I'm encountering a problem with incorrect numpy calculations when the inputs to a calculation are a numpy array with a 32-bit integer data type, but the outputs include larger numbers that require 64-bit representation.

这是一个最小的工作示例:

Here's a minimal working example:

arr = np.ones(5, dtype=int) * (2**24 + 300)  # arr.dtype defaults to 'int32'

# Following comment from @hpaulj I changed the first line, which was originally:
# arr = np.zeros(5, dtype=int) 
# arr[:] = 2**24 + 300

single_value_calc = 2**8 * (2**24 + 300)
numpy_calc = 2**8 * arr

print(single_value_calc)
print(numpy_calc[0])

# RESULTS
4295044096
76800

所需的输出是numpy数组包含正确的值4295044096,它需要64位来表示它.也就是说,我希望numpy数组在输出需要时自动将其从int32转换为int64,而不是保持32位输出,并在超过2 ^ 32的值后返回0.

The desired output is that the numpy array contains the correct value of 4295044096, which requires 64 bits to represent it. i.e. I would have expected numpy arrays to automatically upcast from int32 to int64 when the output requires it, rather maintaining a 32-bit output and wrapping back to 0 after the value of 2^32 is exceeded.

当然,我可以通过强制使用int64表示来手动解决此问题:

Of course, I can fix the problem manually by forcing int64 representation:

numpy_calc2 = 2**8 * arr.astype('int64')

,但是对于一般代码来说这是不希望的,因为在某些情况下(并非全部),输出仅需要64位表示(即,用于保存大数).在我的用例中,性能至关重要,因此每次都要强制进行上转换将耗资巨大.

but this is undesirable for general code, since the output will only need 64-bit representation (i.e. to hold large numbers) in some cases and not all. In my use case, performance is critical so forcing upcasting every time would be costly.

这是numpy数组的预期行为吗?如果是这样,请问有一个干净,高效的解决方案吗?

Is this the intended behaviour of numpy arrays? And if so, is there a clean, performant solution please?

推荐答案

在numpy中进行类型转换和提升相当复杂,有时会令人惊讶. 塞巴斯蒂安·伯格(Sebastian Berg)最近的非正式论文解释了该主题的一些细微差别(主要集中在标量和0d数组).

Type casting and promotion in numpy is fairly complicated and occasionally surprising. This recent unofficial write-up by Sebastian Berg explains some of the nuances of the subject (mostly concentrating on scalars and 0d arrays).

此文档的报价:

Python整数和浮点数

请注意,python整数的处理方式与numpy的处理方式完全相同.但是,它们的特殊之处在于它们没有显式关联的dtype.如此处所述,基于值的逻辑对于python整数和浮点数允许的使用似乎很有用:

Python Integers and Floats

Note that python integers are handled exactly like numpy ones. They are, however, special in that they do not have a dtype associated with them explicitly. Value based logic, as described here, seems useful for python integers and floats to allow:

arr = np.arange(10, dtype=np.int8)
arr += 1
# or:
res = arr + 1
res.dtype == np.int8

,以确保不会发生上流(例如,内存使用率更高).

(重点是我的)

另请参见与先前文档链接的艾伦·霍尔丹(Alan Haldane)的要旨,建议采用C样式强制类型:

See also Allan Haldane's gist suggesting C-style type coercion, linked from the previous document:

当前,当二进制操作涉及两个dtype时,numpy的原理是输出dtype的范围涵盖了两个输入dtype的范围",,并且当涉及单个dtype时,就不会进行任何强制转换./strong>

Currently, when two dtypes are involved in a binary operation numpy's principle is that "the output dtype's range covers the range of both input dtypes", and when a single dtype is involved there is never any cast.

(再次强调我的意思.)

(emphasis again mine.)

所以我的理解是,numpy标量和数组的提升规则不同,主要是因为检查数组中的每个元素以确定是否可以安全地进行转换是不可行的.再次从以前的文件中获取:

So my understanding is that the promotion rules for numpy scalars and arrays differ, primarily because it's not feasible to check every element inside an array to determine whether casting can be done safely. Again from the former document:

基于标量的规则

与无法检查所有值的数组不同,对于标量(和0-D数组),将检查值.

Scalar based rules

Unlike arrays, where inspection of all values is not feasable, for scalars (and 0-D arrays) the value is inspected.

这意味着您可以从一开始就使用np.int64以确保安全(如果您使用的是Linux,则dtype=int实际上将自行执行此操作),或者在检查数组的最大值之前对操作进行怀疑,并根据情况确定是否必须自己提升dtype.我了解到,如果您要进行大量计算,这可能不可行,但我认为考虑numpy当前的类型提升规则,这是没有办法的.

This would mean that you can either use np.int64 from the start to be safe (and if you're on linux then dtype=int will actually do this on its own), or check the maximum value of your arrays before suspect operations and determine if you have to promote the dtype yourself, on a case-by-case basis. I understand that this might not be feasible if you are doing a lot of calculations, but I don't believe there is a way around this considering numpy's current type promotion rules.

这篇关于使用int dtype进行numpy数组计算时出错(在需要时无法将dtype自动转换为64位)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆