使用array.dtype =<数据类型>分配dtype值. NumPy数组中的结果不明确 [英] Assigning dtype value using array.dtype = <data type> in NumPy arrays gives ambiguous results

查看:345
本文介绍了使用array.dtype =<数据类型>分配dtype值. NumPy数组中的结果不明确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是编程和numpy的新手...在阅读教程并尝试使用jupyter-notebook时...我想到了将numpy数组的dtype转换如下:

I am new to programming and numpy... While reading tutorials and experimenting on jupyter-notebook... I thought of converting dtype of a numpy array as follows:

import numpy as np
c = np.random.rand(4)*10
print c
#Output1: [ 0.12757225  5.48992242  7.63139022  2.92746857]
c.dtype = int
print c
#Output2: [4593764294844833304 4617867121563982285 4620278199966380988 4613774491979221856]

我知道正确的更改方法是:

I know the proper way of changing is:

c = c.astype(int)

但是我想知道Output2中那些模棱两可的数字背后的原因.它们是什么,它们代表什么?

But I want to the reason behind those ambiguous numbers in Output2. What are they and what do they signify?

推荐答案

浮点数和整数(numpy.float64 s和numpy.int64 s)在内存中的表示方式不同.以这些不同类型存储的值42对应于内存中的不同位模式.

Floats and integers (numpy.float64s and numpy.int64s) are represented differently in memory. The value 42 stored in these different types corresponds to a different bit pattern in memory.

当您重新分配数组的dtype属性时,您将保持基础数据不变,并告诉numpy以新的方式解释这种位模式.由于现在的解释与数据的原始定义不匹配,因此最终会出现乱码(无意义的数字).

When you're reassigning the dtype attribute of an array, you keep the underlying data unchanged, and you're telling numpy to interpret that pattern of bits in a new way. Since the interpretation now doesn't match the original definition of the data, you end up with gibberish (meaningless numbers).

另一方面,通过.astype()转换数组实际上将转换内存中的数据:

On the other hand, converting your array via .astype() will actually convert the data in memory:

>>> import numpy as np
>>> arr = np.random.rand(3)
>>> arr.dtype
dtype('float64')
>>> arr
array([ 0.7258989 ,  0.56473195,  0.20885672])
>>> arr.data
<memory at 0x7f10d7061288>
>>> arr.dtype = np.int64
>>> arr.data
<memory at 0x7f10d7061348>
>>> arr
array([4604713535589390862, 4603261872765946451, 4596692876638008676])

正确的转换:

>>> arr = np.random.rand(3)*10
>>> arr
array([ 3.59591191,  1.21786042,  6.42272461])
>>> arr.astype(np.int64)
array([3, 1, 6])

如您所见,使用astype将有意义地转换数组的原始值,在这种情况下,它将截断为整数部分,并返回具有相应值和dtype的新数组.

As you can see, using astype will meaningfully convert the original values of the array, in this case it will truncate to the integer part, and return a new array with corresponding values and dtype.

请注意,分配新的dtype不会触发任何检查,因此您可以对数组进行非常奇怪的处理.在上面的示例中,浮点数的64位被重新解释为64位的整数.但是您也可以更改位大小:

Note that assigning a new dtype doesn't trigger any checks, so you can do very weird stuff with your array. In the above example, 64 bits of floats were reinterpreted as 64 bits of integers. But you can also change the bit size:

>>> arr = np.random.rand(3)
>>> arr.shape
(3,)
>>> arr.dtype
dtype('float64')
>>> arr.dtype = np.float32
>>> arr.shape
(6,)
>>> arr
array([  4.00690371e+35,   1.87285304e+00,   8.62005305e+13,
         1.33751166e+00,   7.17894062e+30,   1.81315207e+00], dtype=float32)

通过告诉numpy您的数据所占空间是原始空间的一半,numpy会推断出您的数组中元素的数量是原来的两倍!显然,这不是您应该做的.

By telling numpy that your data occupies half the space than originally, numpy will deduce that your array has twice as many elements! Clearly not what you should ever want to do.

另一个例子:考虑8位无符号整数255 == 2 ** 8-1:它对应于二进制的11111111.现在,尝试将其中两个数字重新解释为单个16位无符号整数:

Another example: consider the 8-bit unsigned integer 255==2**8-1: it corresponds to 11111111 in binary. Now, try to reinterpret two of these numbers as a single 16-bit unsigned integer:

>>> arr = np.array([255,255],dtype=np.uint8)
>>> arr.dtype = np.uint16
>>> arr
array([65535], dtype=uint16)

如您所见,结果是单个数字65535.如果没有响起,则它正好是2 ** 16-1,二进制格式中有16个数字.这两个全一模式被重新解释为单个16位数字,结果也相应更改.您经常看到怪异数字的原因是,由于在内存中表示的是浮点数,因此将浮点数重新解释为整数(反之亦然)将导致对数据的更强处理.

As you can see, the result is the single number 65535. If that doesn't ring a bell, it's exactly 2**16-1, with 16 ones in its binary pattern. The two full-one patterns were reinterpreted as a single 16-bit number, and the result changed accordingly. The reason you often see weirder numbers is that reinterpreting floats as ints as vice versa will lead to a much stronger mangling of the data, due to how floating-point numbers are represented in memory.

作为dtype数组的.ndarray.view.html"rel =" nofollow noreferrer> view .这可能比必须重新分配给定数组的dtype有用,但是再次更改dtype仅在相当罕见的非常特殊的用例中有用.

As hpaulj noted, you can directly perform this reinterpretation of the data by constructing a new view of the array with a modified dtype. This is probably more useful than having to reassign the dtype of a given array, but then again changing the dtype is only useful in fairly rare, very specific use cases.

这篇关于使用array.dtype =&lt;数据类型&gt;分配dtype值. NumPy数组中的结果不明确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆