从 ctypes 数组获取数据到 numpy [英] Getting data from ctypes array into numpy

查看:39
本文介绍了从 ctypes 数组获取数据到 numpy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Python(通过 ctypes)包装的 C 库来运行一系列计算.在运行的不同阶段,我想将数据导入Python,特别是numpy数组.

我使用的包装对数组数据执行两种不同类型的返回(我特别感兴趣):

  • ctypes Array:当我执行 type(x)(其中 x 是 ctypes数组,我得到一个 <class 'module_name.wrapper_class_name.c_double_Array_12000'> 作为回报.我知道这个数据是文档中内部数据的副本,我可以将其放入一个 numpy 数组很容易:

    <预><代码>>>>np.ctypeslib.as_array(x)

这将返回数据的一维 numpy 数组.

  • ctype 指向数据的指针:在本例中,从库的文档中,我了解到我正在获取指向存储的数据并直接用于图书馆.乳清我做 type(y) (其中 y 是指针)我得到 .在这种情况下,我仍然可以像 y[0][2] 这样的数据进行索引,但我只能通过一个超级尴尬的方式将其放入 numpy:

    <预><代码>>>>np.frombuffer(np.core.multiarray.int_asbuffer(ctypes.addressof(y.contents), array_length*np.dtype(float).itemsize))

我在一个旧的 numpy 邮件列表中发现了这个 来自 Travis Oliphant 的线程,但不在 numpy 文档中.如果我尝试使用上述方法而不是这种方法,则会得到以下结果:

<预><代码>>>>np.ctypeslib.as_array(y)......一堆堆栈信息...AttributeError: 'LP_c_double' 对象没有属性 '__array_interface__'

这种 np.frombuffer 方法是最好的还是唯一的方法?我对其他建议持开放态度,但必须仍然希望使用 numpy,因为我有很多其他依赖于我想使用的 numpy 功能的后处理代码这些数据.

解决方案

从 ctypes 指针对象创建 NumPy 数组是一个有问题的操作.目前尚不清楚谁真正拥有指针指向的内存.什么时候能再次释放?有效期是多久?只要有可能,我都会尽量避免这种构造.在 Python 代码中创建数组并将它们传递给 C 函数比使用由不支持 Python 的 C 函数分配的内存要容易和安全得多.通过执行后者,您在某种程度上否定了使用高级语言来处理内存管理的优势.

如果你真的确定有人负责内存,你可以创建一个暴露 Python缓冲协议"的对象,然后使用这个缓冲对象创建一个 NumPy 数组.您通过未记录的 int_asbuffer() 函数在帖子中提供了一种创建缓冲区对象的方法:

buffer = numpy.core.multiarray.int_asbuffer(ctypes.addressof(y.contents), 8*array_length)

(请注意,我将 8 替换为 np.dtype(float).itemsize.在任何平台上,它始终为 8.)创建缓冲区对象的不同方法将是通过 ctypes 从 Python C API 调用 PyBuffer_FromMemory() 函数:

buffer_from_memory = ctypes.pythonapi.PyBuffer_FromMemorybuffer_from_memory.restype = ctypes.py_objectbuffer = buffer_from_memory(y, 8*array_length)

对于这两种方式,您都可以通过

buffer 创建一个 NumPy 数组

a = numpy.frombuffer(buffer, float)

(我实际上不明白您为什么使用 .astype() 而不是 frombuffer 的第二个参数;此外,我想知道您为什么使用 np.int,而你之前说过数组包含 doubles.)

恐怕不会比这更容易了,但也没有那么糟糕,你不觉得吗?您可以将所有丑陋的细节都隐藏在一个包装函数中,而不必再担心了.

I am using a Python (via ctypes) wrapped C library to run a series of computation. At different stages of the running, I want to get data into Python, and specifically numpy arrays.

The wrapping I am using does two different types of return for array data (which is of particular interest to me):

  • ctypes Array: When I do type(x) (where x is the ctypes array, I get a <class 'module_name.wrapper_class_name.c_double_Array_12000'> in return. I know that this data is a copy of the internal data from the documentation and I am able to get it into a numpy array easily:

    >>> np.ctypeslib.as_array(x)
    

This returns a 1D numpy array of the data.

  • ctype pointer to data: In this case from the library's documentation, I understand that I am getting a pointer to the data stored and used directly to the library. Whey I do type(y) (where y is the pointer) I get <class 'module_name.wrapper_class_name.LP_c_double'>. With this case I am still able to index through the data like y[0][2], but I was only able to get it into numpy via a super awkward:

    >>> np.frombuffer(np.core.multiarray.int_asbuffer(
        ctypes.addressof(y.contents), array_length*np.dtype(float).itemsize))
    

I found this in an old numpy mailing list thread from Travis Oliphant, but not in the numpy documentation. If instead of this approach I try as above I get the following:

>>> np.ctypeslib.as_array(y)
...
...  BUNCH OF STACK INFORMATION
...
AttributeError: 'LP_c_double' object has no attribute '__array_interface__'

Is this np.frombuffer approach the best or only way to do this? I am open to other suggestions but must would still like to use numpy as I have a lot of other post-processing code that relies on numpy functionality that I want to use with this data.

解决方案

Creating NumPy arrays from a ctypes pointer object is a problematic operation. It is unclear who actually owns the memory the pointer is pointing to. When will it be freed again? How long is it valid? Whenever possible I would try to avoid this kind of construct. It is so much easier and safer to create arrays in the Python code and pass them to the C function than to use memory allocated by a Python-unaware C function. By doing the latter, you negate to some extent the advantages of having a high-level language taking care of the memory management.

If you are really sure that someone takes care of the memory, you can create an object exposing the Python "buffer protocol" and then create a NumPy array using this buffer object. You gave one way of creating the buffer object in your post, via the undocumented int_asbuffer() function:

buffer = numpy.core.multiarray.int_asbuffer(
    ctypes.addressof(y.contents), 8*array_length)

(Note that I substituted 8 for np.dtype(float).itemsize. It's always 8, on any platform.) A different way to create the buffer object would be to call the PyBuffer_FromMemory() function from the Python C API via ctypes:

buffer_from_memory = ctypes.pythonapi.PyBuffer_FromMemory
buffer_from_memory.restype = ctypes.py_object
buffer = buffer_from_memory(y, 8*array_length)

For both these ways, you can create a NumPy array from buffer by

a = numpy.frombuffer(buffer, float)

(I actually do not understand why you use .astype() instead of a second parameter to frombuffer; furthermore, I wonder why you use np.int, while you said earlier that the array contains doubles.)

I'm afraid it won't get much easier than this, but it isn't that bad, don't you think? You could bury all the ugly details in a wrapper function and don't worry about it any more.

这篇关于从 ctypes 数组获取数据到 numpy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆