使用numpy/ctypes公开C分配的内存缓冲区的更安全方法? [英] Safer way to expose a C-allocated memory buffer using numpy/ctypes?

查看:93
本文介绍了使用numpy/ctypes公开C分配的内存缓冲区的更安全方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为使用共享内存缓冲区存储其内部状态的C库编写Python绑定.这些缓冲区的分配和释放是由库本身在Python外部完成的,但是我可以通过从Python内部调用包装的构造函数/析构函数来间接控制何时发生这种情况.我想将某些缓冲区公开给Python,以便我可以从中读取,并在某些情况下将值推入它们.性能和内存使用是重要的考虑因素,因此我希望避免在任何可能的地方复制数据.

I'm writing Python bindings for a C library that uses shared memory buffers to store its internal state. The allocation and freeing of these buffers is done outside of Python by the library itself, but I can indirectly control when this happens by calling wrapped constructor/destructor functions from within Python. I'd like to expose some of the buffers to Python so that I can read from them, and in some cases push values to them. Performance and memory use are important concerns, so I would like to avoid copying data wherever possible.

我当前的方法是创建一个numpy数组,以提供对ctypes指针的直接视图:

My current approach is to create a numpy array that provides a direct view onto a ctypes pointer:

import numpy as np
import ctypes as C

libc = C.CDLL('libc.so.6')

class MyWrapper(object):

    def __init__(self, n=10):
        # buffer allocated by external library
        addr = libc.malloc(C.sizeof(C.c_int) * n)
        self._cbuf = (C.c_int * n).from_address(addr)

    def __del__(self):
        # buffer freed by external library
        libc.free(C.addressof(self._cbuf))
        self._cbuf = None

    @property
    def buffer(self):
        return np.ctypeslib.as_array(self._cbuf)

除了避免复制,这还意味着我可以使用numpy的索引和赋值语法并将其直接传递给其他numpy函数:

As well as avoiding copies, this also means I can use numpy's indexing and assignment syntax and pass it directly to other numpy functions:

wrap = MyWrapper()
buf = wrap.buffer       # buf is now a writeable view of a C-allocated buffer

buf[:] = np.arange(10)  # this is pretty cool!
buf[::2] += 10

print(wrap.buffer)
# [10  1 12  3 14  5 16  7 18  9]

但是,它也具有内在的危险性:

However, it's also inherently dangerous:

del wrap                # free the pointer

print(buf)              # this is bad!
# [1852404336 1969367156  538978662  538976288  538976288  538976288
#  1752440867 1763734377 1633820787       8548]

# buf[0] = 99           # uncomment this line if you <3 segfaults

为了更加安全,我需要能够在尝试读取/写入数组内容之前检查基础C指针是否已释放.我对此有一些想法:

To make this safer, I need to be able to check whether the underlying C pointer has been freed before I try to read/write to the array contents. I have a few thoughts on how to do this:

  • 一种方法是生成np.ndarray的子类,该子类持有对MyWrapper_cbuf属性的引用,在对其底层内存进行任何读/写操作之前检查它是否为None并引发如果是这种情况,则为例外.
  • 我可以轻松地在同一个缓冲区上生成多个视图,例如通过.view强制转换或切片,因此每一个都需要继承对_cbuf的引用以及执行检查的方法.我怀疑这可以通过覆盖__array_finalize__来实现,但我不确定具体如何.
  • 还需要在读取和/或写入数组内容的任何操作之前调用指针检查"方法.我对numpy的内部知识了解不足,无法详尽列出要覆盖的方法.
  • One way would be to generate a subclass of np.ndarray that holds a reference to the _cbuf attribute of MyWrapper, checks whether it is None before doing any reading/writing to its underlying memory, and raises an exception if this is the case.
  • I could easily generate multiple views onto the same buffer, e.g. by .view casting or slicing, so each of these would need to inherit the reference to _cbuf and the method that performs the check. I suspect that this could be achieved by overriding __array_finalize__, but I'm not sure exactly how.
  • The "pointer-checking" method would also need to be called before any operation that would read and/or write to the contents of the array. I don't know enough about numpy's internals to have an exhaustive list of methods to override.

如何实现执行此检查的np.ndarray的子类?谁能建议一种更好的方法?

How could I implement a subclass of np.ndarray that performs this check? Can anyone suggest a better approach?

更新:该课程完成了我想要的大部分工作:

Update: This class does most of what I want:

class SafeBufferView(np.ndarray):

    def __new__(cls, get_buffer, shape=None, dtype=None):
        obj = np.ctypeslib.as_array(get_buffer(), shape).view(cls)
        if dtype is not None:
            obj.dtype = dtype
        obj._get_buffer = get_buffer
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self._get_buffer = getattr(obj, "_get_buffer", None)

    def __array_prepare__(self, out_arr, context=None):
        if not self._get_buffer(): raise Exception("Dangling pointer!")
        return out_arr

    # this seems very heavy-handed - surely there must be a better way?
    def __getattribute__(self, name):
        if name not in ["__new__", "__array_finalize__", "__array_prepare__",
                        "__getattribute__", "_get_buffer"]:
            if not self._get_buffer(): raise Exception("Dangling pointer!")
        return super(np.ndarray, self).__getattribute__(name)

例如:

wrap = MyWrapper()
sb = SafeBufferView(lambda: wrap._cbuf)
sb[:] = np.arange(10)

print(repr(sb))
# SafeBufferView([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

print(repr(sb[::2]))
# SafeBufferView([0, 2, 4, 6, 8], dtype=int32)

sbv = sb.view(np.double)
print(repr(sbv))
# SafeBufferView([  2.12199579e-314,   6.36598737e-314,   1.06099790e-313,
#          1.48539705e-313,   1.90979621e-313])

# we have to call the destructor method of `wrap` explicitly - `del wrap` won't
# do anything because `sb` and `sbv` both hold references to `wrap`
wrap.__del__()

print(sb)                # Exception: Dangling pointer!
print(sb + 1)            # Exception: Dangling pointer!
print(sbv)               # Exception: Dangling pointer!
print(np.sum(sb))        # Exception: Dangling pointer!
print(sb.dot(sb))        # Exception: Dangling pointer!

print(np.dot(sb, sb))    # oops...
# -70104698

print(np.extract(np.ones(10), sb))
# array([251019024,     32522, 498870232,     32522,         4,         5,
#               6,         7,        48,         0], dtype=int32)

# np.copyto(sb, np.ones(10, np.int32))    # don't try this at home, kids!

我确定我还错过了其他一些极端情况.

I'm sure there are other edge cases I've missed.

更新2:,正如 @ivan_pozdeev 所建议的,我玩过weakref.proxy.这是一个好主意,但不幸的是,我看不到它如何与numpy数组一起工作.我可以尝试创建对.buffer返回的numpy数组的weakref:

Update 2: I've had a play around with weakref.proxy, as suggested by @ivan_pozdeev. It's a nice idea, but unfortunately I can't see how it would work with numpy arrays. I could try to create a weakref to the numpy array returned by .buffer:

wrap = MyWrapper()
wr = weakref.proxy(wrap.buffer)
print(wr)
# ReferenceError: weakly-referenced object no longer exists
# <weakproxy at 0x7f6fe715efc8 to NoneType at 0x91a870>

我认为这里的问题是wrap.buffer返回的np.ndarray实例立即超出范围.解决方法是让类在初始化时实例化数组,对其进行严格引用,并让.buffer() getter将weakref.proxy返回给数组:

I think the problem here is that the np.ndarray instance returned by wrap.buffer immediately goes out of scope. A workaround would be for the class to instantiate the array on initialization, hold a strong reference to it, and have the .buffer() getter return a weakref.proxy to the array:

class MyWrapper2(object):

    def __init__(self, n=10):
        # buffer allocated by external library
        addr = libc.malloc(C.sizeof(C.c_int) * n)
        self._cbuf = (C.c_int * n).from_address(addr)
        self._buffer = np.ctypeslib.as_array(self._cbuf)

    def __del__(self):
        # buffer freed by external library
        libc.free(C.addressof(self._cbuf))
        self._cbuf = None
        self._buffer = None

    @property
    def buffer(self):
        return weakref.proxy(self._buffer)

但是,如果在仍分配缓冲区的同时在同一数组上创建第二个视图,则此操作会中断:

However, this breaks if I create a second view onto the same array whilst the buffer is still allocated:

wrap2 = MyWrapper2()
buf = wrap2.buffer
buf[:] = np.arange(10)

buf2 = buf[:]   # create a second view onto the contents of buf

print(repr(buf))
# <weakproxy at 0x7fec3e709b50 to numpy.ndarray at 0x210ac80>
print(repr(buf2))
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int32)

wrap2.__del__()

print(buf2[:])  # this is bad
# [1291716568    32748 1291716568    32748        0        0        0
#         0       48        0] 

print(buf[:])   # WTF?!
# [34525664        0        0        0        0        0        0        0
#         0        0]  

这是严重的问题-调用wrap2.__del__()后,我不仅可以读写buf2上的numpy数组视图buf2,而且甚至可以读写buf,鉴于wrap2.__del__()wrap2._buffer设置为None,这是不可能的.

This is seriously broken - after calling wrap2.__del__() not only can I read and write to buf2 which was a numpy array view onto wrap2._cbuf, but I can even read and write to buf, which should not be possible given that wrap2.__del__() sets wrap2._buffer to None.

推荐答案

在存在任何numpy数组时,您必须保留对包装器的引用.实现此目的最简单的方法是将该引用保存在ctype-buffer的属性中:

You have to keep a reference to your Wrapper while any numpy array exists. Easiest way to achieve this, is to save this reference in a attribute of the ctype-buffer:

class MyWrapper(object):
    def __init__(self, n=10):
        # buffer allocated by external library
        self.size = n
        self.addr = libc.malloc(C.sizeof(C.c_int) * n)

    def __del__(self):
        # buffer freed by external library
        libc.free(self.addr)

    @property
    def buffer(self):
        buf = (C.c_int * self.size).from_address(self.addr)
        buf._wrapper = self
        return np.ctypeslib.as_array(buf)

这样,当最后一个引用(例如最后一个numpy数组)被垃圾回收时,您的包装器将自动释放.

This way you're wrapper is automatically freed, when the last reference, e.g the last numpy array, is garbage collected.

这篇关于使用numpy/ctypes公开C分配的内存缓冲区的更安全方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆