cython中的np.int,np.int_,int和np.int_t之间的区别? [英] Difference between np.int, np.int_, int, and np.int_t in cython?

查看:88
本文介绍了cython中的np.int,np.int_,int和np.int_t之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在cython中,要处理这么多的int数据类型我感到有些困惑.

np.int, np.int_, np.int_t, int

我猜纯Python中的int等效于np.int_,那么np.int是从哪里来的呢?我无法从numpy中找到文档?另外,为什么np.int_存在,因为我们已经有了int?

在cython中,我猜想int用作cdef intndarray[int]时会变成C类型,而当用作int()时它仍保留为python caster吗?

np.int_是否等效于C中的long?所以cdef longcdef np.int_相同吗?

在什么情况下应该使用np.int_t代替np.int?例如cdef np.int_tndarray[np.int_t] ...

有人可以简要解释一下这些类型的错误使用将如何影响编译后的cython代码的性能吗?

解决方案

这有点复杂,因为名称根据上下文具有不同的含义.

int

  1. 在Python中

    int通常只是Python类型,具有任意精度,这意味着您可以在其中存储任何可能的整数(只要有足够的内存即可).

    >>> int(10**50)
    100000000000000000000000000000000000000000000000000
    

  2. 但是,当将其用作NumPy数组的dtype时,它将被解释为np.int_ 1 . 不是具有任意精度,它将具有与C的long相同的大小:

    >>> np.array(10**50, dtype=int)
    OverflowError: Python int too large to convert to C long
    

    这也意味着以下两个是等效的:

    np.array([1,2,3], dtype=int)
    np.array([1,2,3], dtype=np.int_)
    

  3. 作为Cython类型标识符,它还有另一种含义,在这里代表键入int.它的精度有限(通常为32位).您可以将其用作Cython类型,例如,在使用cdef:

    定义变量时

    cdef int value = 100    # variable
    cdef int[:] arr = ...   # memoryview
    

    作为cdefcpdef函数的返回值或参数值:

    cdef int my_function(int argument1, int argument2):
        # ...
    

    作为ndarray的通用":

    cimport numpy as cnp
    cdef cnp.ndarray[int, ndim=1] val = ...
    

    对于类型转换:

    avalue = <int>(another_value)
    

    也许还有更多.

  4. 在Cython中,但作为Python类型.您仍然可以调用int并获得一个"Python int"(任意精度),或者将其用于isinstance或作为dtype参数用于np.array.这里的上下文很重要,因此转换为Python int与转换为C int是不同的:

    cdef object val = int(10)  # Python int
    cdef int val = <int>(10)   # C int
    

np.int

实际上,这很容易.它只是int的别名:

>>> int is np.int
True

因此,以上所有内容也适用于np.int.但是,除非在cimport ed软件包上使用它,否则不能将其用作类型标识符.在这种情况下,它表示Python整数类型.

cimport numpy as cnp

cpdef func(cnp.int obj):
    return obj

这将期望obj是Python整数不是NumPy类型:

>>> func(np.int_(10))
TypeError: Argument 'obj' has incorrect type (expected int, got numpy.int32)
>>> func(10)
10

我对np.int的建议:尽可能避免使用它.在Python代码中,它等效于int,在Cython代码中,其等效于Pythons int,但是如果用作类型标识符,它可能会使您和所有阅读代码的人感到困惑!这肯定让我感到困惑...

np.int_

实际上,它只有一个含义:它是表示标量NumPy类型的 Python类型.您可以像Pythons int:

一样使用它

>>> np.int_(10)        # looks like a normal Python integer
10
>>> type(np.int_(10))  # but isn't (output may vary depending on your system!)
numpy.int32

或者您用它来指定dtype,例如使用np.array:

>>> np.array([1,2,3], dtype=np.int_)
array([1, 2, 3])

但是您不能将其用作Cython中的类型标识符.

cnp.int_t

这是np.int_的类型标识符版本.这意味着您不能将其用作dtype参数.但是您可以将其用作cdef声明的类型:

cimport numpy as cnp
import numpy as np

cdef cnp.int_t[:] arr = np.array([1,2,3], dtype=np.int_)
     |---TYPE---|                         |---DTYPE---|

此示例(希望如此)显示,带有结尾_t的类型标识符实际上使用 dtype 表示了数组的类型,而没有结尾t.您无法在Cython代码中互换它们!

注释

NumPy中还有其他几种数字类型,我将包括一个列表,其中包含NumPy dtype和Cython类型标识符以及也可以在Cython中使用的C类型标识符.但这基本上取自 NumPy文档 Cython NumPy 文件:

NumPy dtype          Numpy Cython type         C Cython type identifier

np.bool_             None                      None
np.int_              cnp.int_t                 long
np.intc              None                      int       
np.intp              cnp.intp_t                ssize_t
np.int8              cnp.int8_t                signed char
np.int16             cnp.int16_t               signed short
np.int32             cnp.int32_t               signed int
np.int64             cnp.int64_t               signed long long
np.uint8             cnp.uint8_t               unsigned char
np.uint16            cnp.uint16_t              unsigned short
np.uint32            cnp.uint32_t              unsigned int
np.uint64            cnp.uint64_t              unsigned long
np.float_            cnp.float64_t             double
np.float32           cnp.float32_t             float
np.float64           cnp.float64_t             double
np.complex_          cnp.complex128_t          double complex
np.complex64         cnp.complex64_t           float complex
np.complex128        cnp.complex128_t          double complex

实际上,np.bool_有Cython类型:cnp.npy_boolbint,但它们当前都不能用于NumPy数组.对于标量,cnp.npy_bool将只是一个无符号整数,而bint将是一个布尔值.不确定那里发生了什么...


1 来自

内置Python类型

当用于生成dtype对象时,几种python类型等效于相应的数组标量:

int           np.int_
bool          np.bool_
float         np.float_
complex       np.cfloat
bytes         np.bytes_
str           np.bytes_ (Python2) or np.unicode_ (Python3)
unicode       np.unicode_
buffer        np.void
(all others)  np.object_

I am a bit struggled with so many int data types in cython.

np.int, np.int_, np.int_t, int

I guess int in pure python is equivalent to np.int_, then where does np.int come from? I cannot find the document from numpy? Also, why does np.int_ exist given we do already have int?

In cython, I guess int becomes a C type when used as cdef int or ndarray[int], and when used as int() it stays as the python caster?

Is np.int_ equivalent to long in C? so cdef long is the identical to cdef np.int_?

Under what circumstances should I use np.int_t instead of np.int? e.g. cdef np.int_t, ndarray[np.int_t] ...

Can someone briefly explain how the wrong use of those types would affect the performance of compiled cython code?

解决方案

It's a bit complicated because the names have different meanings depending on the context.

int

  1. In Python

    The int is normally just a Python type, it's of arbitrary precision, meaning that you can store any conceivable integer inside it (as long as you have enough memory).

    >>> int(10**50)
    100000000000000000000000000000000000000000000000000
    

  2. However, when you use it as dtype for a NumPy array it will be interpreted as np.int_ 1. Which is not of arbitrary precision, it will have the same size as C's long:

    >>> np.array(10**50, dtype=int)
    OverflowError: Python int too large to convert to C long
    

    That also means the following two are equivalent:

    np.array([1,2,3], dtype=int)
    np.array([1,2,3], dtype=np.int_)
    

  3. As Cython type identifier it has another meaning, here it stands for the type int. It's of limited precision (typically 32bits). You can use it as Cython type, for example when defining variables with cdef:

    cdef int value = 100    # variable
    cdef int[:] arr = ...   # memoryview
    

    As return value or argument value for cdef or cpdef functions:

    cdef int my_function(int argument1, int argument2):
        # ...
    

    As "generic" for ndarray:

    cimport numpy as cnp
    cdef cnp.ndarray[int, ndim=1] val = ...
    

    For type casting:

    avalue = <int>(another_value)
    

    And probably many more.

  4. In Cython but as Python type. You can still call int and you'll get a "Python int" (of arbitrary precision), or use it for isinstance or as dtype argument for np.array. Here the context is important, so converting to a Python int is different from converting to a C int:

    cdef object val = int(10)  # Python int
    cdef int val = <int>(10)   # C int
    

np.int

Actually this is very easy. It's just an alias for int:

>>> int is np.int
True

So everything from above applies to np.int as well. However you can't use it as a type-identifier except when you use it on the cimported package. In that case it represents the Python integer type.

cimport numpy as cnp

cpdef func(cnp.int obj):
    return obj

This will expect obj to be a Python integer not a NumPy type:

>>> func(np.int_(10))
TypeError: Argument 'obj' has incorrect type (expected int, got numpy.int32)
>>> func(10)
10

My advise regarding np.int: Avoid it whenever possible. In Python code it's equivalent to int and in Cython code it's also equivalent to Pythons int but if used as type-identifier it will probably confuse you and everyone who reads the code! It certainly confused me...

np.int_

Actually it only has one meaning: It's a Python type that represents a scalar NumPy type. You use it like Pythons int:

>>> np.int_(10)        # looks like a normal Python integer
10
>>> type(np.int_(10))  # but isn't (output may vary depending on your system!)
numpy.int32

Or you use it to specify the dtype, for example with np.array:

>>> np.array([1,2,3], dtype=np.int_)
array([1, 2, 3])

But you cannot use it as type-identifier in Cython.

cnp.int_t

It's the type-identifier version for np.int_. That means you can't use it as dtype argument. But you can use it as type for cdef declarations:

cimport numpy as cnp
import numpy as np

cdef cnp.int_t[:] arr = np.array([1,2,3], dtype=np.int_)
     |---TYPE---|                         |---DTYPE---|

This example (hopefully) shows that the type-identifier with the trailing _t actually represents the type of an array using the dtype without the trailing t. You can't interchange them in Cython code!

Notes

There are several more numeric types in NumPy I'll include a list containing the NumPy dtype and Cython type-identifier and the C type identifier that could also be used in Cython here. But it's basically taken from the NumPy documentation and the Cython NumPy pxd file:

NumPy dtype          Numpy Cython type         C Cython type identifier

np.bool_             None                      None
np.int_              cnp.int_t                 long
np.intc              None                      int       
np.intp              cnp.intp_t                ssize_t
np.int8              cnp.int8_t                signed char
np.int16             cnp.int16_t               signed short
np.int32             cnp.int32_t               signed int
np.int64             cnp.int64_t               signed long long
np.uint8             cnp.uint8_t               unsigned char
np.uint16            cnp.uint16_t              unsigned short
np.uint32            cnp.uint32_t              unsigned int
np.uint64            cnp.uint64_t              unsigned long
np.float_            cnp.float64_t             double
np.float32           cnp.float32_t             float
np.float64           cnp.float64_t             double
np.complex_          cnp.complex128_t          double complex
np.complex64         cnp.complex64_t           float complex
np.complex128        cnp.complex128_t          double complex

Actually there are Cython types for np.bool_: cnp.npy_bool and bint but both they can't be used for NumPy arrays currently. For scalars cnp.npy_bool will just be an unsigned integer while bint will be a boolean. Not sure what's going on there...


1 Taken From the NumPy documentation "Data type objects"

Built-in Python types

Several python types are equivalent to a corresponding array scalar when used to generate a dtype object:

int           np.int_
bool          np.bool_
float         np.float_
complex       np.cfloat
bytes         np.bytes_
str           np.bytes_ (Python2) or np.unicode_ (Python3)
unicode       np.unicode_
buffer        np.void
(all others)  np.object_

这篇关于cython中的np.int,np.int_,int和np.int_t之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆