cython:字符串ndarray的内存视图(或直接ndarray索引) [英] cython: memory view of ndarray of strings (or direct ndarray indexing)

查看:114
本文介绍了cython:字符串ndarray的内存视图(或直接ndarray索引)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何指定包含字符串的ndarray的内存视图?

How does one specify a memory view of a ndarray that contains strings?

char[:]char*[:],...不起作用.

char[:], char*[:], ... do not work.

为了说明,我的问题是函数abc(...)的定义:

To illustrate, my problem is the definition of function abc(...):

cdef void abc(char[:] in_buffer):
    cdef char * element
    element = address(in_buffer[1])
    ...

def main():
    cdef Py_ssize_t i, n = 100

    a = np.array(['ABC', 'D', 'EFGHI'])
    for i in range(n):
        abc(a)

如果无法使用memoryview,我可以自己实现直接数组访问吗?我需要避免使用函数abc(...)的GIL.

If a memoryview is not possible, can I implement direct array access myself? I need to avoid the GIL for function abc(...).

是对Bi Rico的回答.

Edit 1: In response to Bi Rico's answer.

我的目标是释放函数abc(...)的GIL,并在其中使用c字符串函数处理ndarray in_buffer的字符串元素. IE.类似于以下内容:

My aim is to release the GIL for function abc(...) and within it process the string elements of ndarray in_buffer with c string-functions. I.e. something like the following:

cdef void abc(char[:, ::1] in_buffer) nogil:
    cdef int max_elt_length = in_buffer.shape[1]+1
    cdef char element[max_elt_length+1]
    cdef int length

    for i in range(in_buffer.shape[0]+1):  # is this equivalent to in_buffer.dtype.itemsize + 1 ?
       element[max_elt_length] = 0   # add null-terminator for full-size elements
       memcpy(element, address(buffer[i, 0]), max_length)
       length = strlen(element)
       ...

推荐答案

问题是numpy数组dtype必须具有固定大小.当您制作一个字符串"数组时,实际上是在制作一个固定长度的char数组.试试这个:

The issue is that numpy array dtypes have to have a fixed size. When you make an array of "strings" you're actually making an array of fixed length char arrays. Try this:

import numpy as np

array = np.array(["cat", "in", "a", "hat"])
array[2] = "Seuss"
print(array)
# ['cat' 'in' 'Seu' 'hat']
print(array.dtype)
# dtype('|S3')
print(array.dtype.itemsize)
# 3

考虑到这一点,您可以执行以下操作:

With that in mind, you could something like this:

cdef void abc(char[:, ::1] in_buffer):
    cdef char * element
    element = address(in_buffer[1, 0])

然后,当您将数组传递给abc时,您需要执行以下操作:

Then when you pass your arrays to abc you'll need to do something like:

a = np.array(['ABC', 'D', 'EFGHI'])
array_view = a.view('uint8').reshape(a.size, a.dtype.itemsize)
abc(array_view)

这只是一种方法,但这是我推荐的一种方法,它不了解您要尝试做的事情.

This is only one approach, but it's the one I would recommend without knowing more about what you're trying to do.

这篇关于cython:字符串ndarray的内存视图(或直接ndarray索引)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆