Python:类型和dtype之间的混淆 [英] Python: confusion between types and dtypes
问题描述
假设我输入:
a = uint8(200)
a*2
然后结果为400,并将其重铸为uint16类型.
Then the result is 400, and it is recast to be of type uint16.
但是:
a = array([200],dtype=uint8)
a*2
结果是
array([144], dtype=uint8)
已对256模进行乘法,以确保结果保留在一个字节中.
The multiplication has been performed modulo 256, to ensure that the result stays in one byte.
我对"types"和"dtypes"感到困惑,并且其中一个优先于另一个使用.如您所见,类型可能会在输出中产生很大的差异.
I'm confused about "types" and "dtypes" and where one is used in preference to another. And as you see, the type may make a significant difference in the output.
例如,我可以创建一个单个的dtype uint8数字,以便对该数字进行256模运算吗?或者,我可以创建uint8类型(而非dtype)的数组,以便对其进行操作将产生0-255范围之外的值吗?
Can I, for example, create a single number of dtype uint8, so that operations on that number will be performed modulo 256? Alternatively, can I create an array of type (not dtype) uint8 so that operations on it will produce values outside the range 0-255?
推荐答案
NumPy数组的type
是numpy.ndarray
;这只是它的Python对象类型(例如,类似于type("hello")
和str
的方式).
The type
of a NumPy array is numpy.ndarray
; this is just the type of Python object it is (similar to how type("hello")
is str
for example).
dtype
仅定义如何用标量(即单个数字)或数组解释内存中的字节以及如何处理字节(例如int
/float
).因此,您不必更改数组或标量的type
,而只需更改其dtype
.
dtype
just defines how bytes in memory will be interpreted by a scalar (i.e. a single number) or an array and the way in which the bytes will be treated (e.g. int
/float
). For that reason you don't change the type
of an array or scalar, just its dtype
.
如您所见,如果将两个标量相乘,则结果数据类型是可以将两个值都转换为的最小安全"类型.但是,将一个数组与一个标量相乘只会返回一个具有相同数据类型的数组.函数np.inspect_types
的文档清楚地说明了特定标量或数组对象的dtype
已更改:
As you observe, if you multiply two scalars, the resulting datatype is the smallest "safe" type to which both values can be cast. However, multiplying an array and a scalar will simply return an array of the same datatype. The documentation for the function np.inspect_types
is clear about when a particular scalar or array object's dtype
is changed:
NumPy中的类型提升与C ++等语言中的规则类似,但有一些细微差别.当同时使用标量和数组时,数组的类型优先,并且标量的实际值也要考虑在内.
Type promotion in NumPy works similarly to the rules in languages like C++, with some slight differences. When both scalars and arrays are used, the array's type takes precedence and the actual value of the scalar is taken into account.
文档继续:
如果仅存在标量,或者标量的最大类别高于数组的最大类别,则将数据类型与
promote_types
组合以生成返回值.
If there are only scalars or the maximum category of the scalars is higher than the maximum category of the arrays, the data types are combined with
promote_types
to produce the return value.
因此对于np.uint8(200) * 2
,两个标量,结果数据类型将是
So for np.uint8(200) * 2
, two scalars, the resulting datatype will be the type returned by np.promote_types
:
>>> np.promote_types(np.uint8, int)
dtype('int32')
对于np.array([200], dtype=np.uint8) * 2
,数组的数据类型优先于标量int
,并返回np.uint8
数据类型.
For np.array([200], dtype=np.uint8) * 2
the array's datatype takes precedence over the scalar int
and a np.uint8
datatype is returned.
要解决有关在操作过程中保留标量的dtype
的最终问题,您必须限制用于避免NumPy自动进行dtype
升级的任何其他标量的数据类型:
To address your final question about retaining the dtype
of a scalar during operations, you'll have to restrict the datatypes of any other scalars you use to avoid NumPy's automatic dtype
promotion:
>>> np.array([200], dtype=np.uint8) * np.uint8(2)
144
当然,另一种选择是简单地将单个值包装在NumPy数组中(然后NumPy不会在标量不同的dtype
的操作中将其强制转换).
The alternative, of course, is to simply wrap the single value in a NumPy array (and then NumPy won't cast it in operations with scalars of different dtype
).
要在操作过程中提升数组的类型,可以先将所有标量包装在数组中:
To promote the type of an array during an operation, you could wrap any scalars in an array first:
>>> np.array([200], dtype=np.uint8) * np.array([2])
array([400])
这篇关于Python:类型和dtype之间的混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!