在Cython中声明一个numpy布尔掩码 [英] Declaring a numpy boolean mask in Cython

查看:229
本文介绍了在Cython中声明一个numpy布尔掩码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我该如何在Cython中声明布尔掩码的类型?我真的需要声明吗?这是示例:

How should I declare the type of a boolean mask in Cython? Do I actually need to declare it? Here is the example:

cpdef my_func(np.ndarray[np.double_t, ndim = 2] array_a,
            np.ndarray[np.double_t, ndim = 2] array_b,
            np.ndarray[np.double_t, ndim = 2] array_c):

    mask = ((array_a > 1) & (array_b == 2) & (array_c == 3)
    array_a[mask] = 0.
    array_b[mask] = array_c[mask]
    return array_a, array_b, array_c

推荐答案

您需要通过np.ndarray[np.uint8_t, ndim = 2, cast=True] mask = ...(即

cimport numpy as np
cpdef my_func(np.ndarray[np.double_t, ndim = 2] array_a,
            np.ndarray[np.double_t, ndim = 2] array_b,
            np.ndarray[np.double_t, ndim = 2] array_c):
    cdef np.ndarray[np.uint8_t, ndim = 2, cast=True] mask = (array_a > 1) & (arr
ay_b == 2) & (array_c == 3)
    array_a[mask] = 0.
    array_b[mask] = array_c[mask]
    return array_a, array_b, array_c

否则(没有cast=True)代码会编译,但由于类型不匹配而在运行时抛出.

otherwise (without cast=True) the code compiles but throws during the runtime because of the type mismatch.

但是,您根本不需要定义mask的类型,就可以将其用作python对象:将会有一些性能损失,或者更确切地说,是错失了一些加快速度的机会可以通过早期的类型绑定来解决,但是就您而言,这可能并不重要.

However, you don't need to define the type of mask at all and can use it as a python-object: there will be some performance penalty or, more precise, a missed opportunity to speed things a little bit up by early type binding, but in your case it probably doesn't matter anyway.

还有一件事:我不知道您的真实代码是什么样子,但是我希望您知道,cython根本不会加快您的示例的速度-与numpy相比,没有任何收获.

One more thing: I don't know how you real code looks like, but I hope you are aware, that cython won't speedup your example at all - there is nothing to gain compared to numpy.

我们可以轻松地验证bool-np.array每个值使用8位(至少在我的系统上).这一点一点都不明显,例如,每个值只能使用一点(类似于bitset):

We can easily verify, that a bool-np.array uses 8bit per a value (at least on my system). This is not obvious at all, for example it could use only a bit per value (a lot like a bitset):

import sys
import numpy as np
a=np.random.random((10000,))
sys.getsizeof(a)
>>> 80096
sys.getsizeof(a<.5)
>>> 10096

很明显,双精度数组每个元素需要8个字节+ 86字节的开销,掩码每个元素仅需要一个字节.

It is pretty obvious the double array needs 8 bytes per element + 86 bytes overhead, the mask needs only one byte per element.

我们还可以看到,False0表示,True1表示:

We can also see, that False is represented by 0 and True by 1:

print (a<.5).view(np.uint8)
[1 0 1 ..., 0 0 1]

使用cast=True使得访问底层数组中的原始字节成为可能,这是对数组内存的reinterpret_cast.

Using cast=True make it possible to access the raw bytes in the underlying array, a kind of reinterpret_cast of the array-memory.

在这里虽然有些旧,信息.

这篇关于在Cython中声明一个numpy布尔掩码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆