Cython输入的memoryviews:它们实际上是什么? [英] Cython typed memoryviews: what they really are?

查看:166
本文介绍了Cython输入的memoryviews:它们实际上是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Cython 文档很好地解释了它们所允许的内容,以及如何声明它们,以及如何使用它们.

The Cython documentation explains very well what they allow for, how you can declare them, and how to use them.

但是,我仍然不清楚它们到底是什么.例如,像这样的numpy数组中的一个简单赋值:

However, it is still not clear to me what they really are. For example, a simple assignment from a numpy array like this:

my_arr = np.empty(10, np.int32)
cdef int [:] new_arr = my_arr

可以使my_arr的访问/分配更快.

can make the accessing/assignment of my_arr faster.

幕后发生了什么事? Numpy应该已经以连续的方式分配了内存中的元素,那么如何处理memoryviews?显然没有那么多,实际上numpy数组new_arr的memoryview分配应该等效于

What is it happening behind the scenes? Numpy should already allocate the elements in memory in a contiguous fashion, so what's the deal with memoryviews? Apparently not that much, in fact the memoryview assignment of the numpy array new_arr should be equivalent to

cdef np.ndarray[np.int32_t, ndim=1] new_arr = np.empty(10, np.int32)

在速度方面.但是,内存视图被认为比numpy数组缓冲区更通用.您能否举一个简单的示例,其中添加的概括"是重要/有趣的?

in terms of speed. However, memoryviews are considered more general than numpy array buffer; could you make a simple example in which the added 'generalization' is important/interesting?

此外,如果我已经分配了一个指针以使事情尽可能快,那么将其强制转换为类型化的memoryview的好处是什么? (该问题的答案可能与上面的答案相同)

Furthermore, if I have already allocated a pointer in order to make things as fast as possible, what is the advantage of casting it to a typed memoryview? (the answer to this question might be the same of the one above)

cdef int *my_arr = <int *> malloc(N * sizeof(int))
cdef int[:] new_arr = <int[:N]>my_arr

推荐答案

什么是内存视图:

在编写函数时:

cdef double[:] a

您最终得到一个__Pyx_memviewslice对象:

typedef struct {
  struct __pyx_memoryview_obj *memview;
  char *data;
  Py_ssize_t shape[8];
  Py_ssize_t strides[8];
  Py_ssize_t suboffsets[8];
} __Pyx_memviewslice;

memoryview包含一个C指针,一些它通常不直接拥有的数据.它还包含指向基础Python对象(struct __pyx_memoryview_obj *memview;)的指针.如果数据归Python对象所有,则memview拥有对该引用的引用,并确保只要存在内存视图,保存数据的Python对象就会保持活动状态.

The memoryview contains a C pointer some some data which it (usually) doesn't directly own. It also contains a pointer to an underlying Python object (struct __pyx_memoryview_obj *memview;). If the data is owned by a Python object then memview holds a reference to that and ensures the Python object that holds the data is kept alive as long as the memoryview is around.

指向原始数据的指针以及如何对其进行索引的信息(shapestridessuboffsets)的组合使Cython可以使用原始数据指针和一些简单的C数学进行索引(这非常有效).例如:

The combination of the pointer to the raw data, and information of how to index it (shape, strides and suboffsets) allows Cython to do indexing the using the raw data pointers and some simple C maths (which is very efficient). e.g.:

x=a[0]

给出类似的内容:

(*((double *) ( /* dim=0 */ (__pyx_v_a.data + __pyx_t_2 * __pyx_v_a.strides[0]) )));

相反,如果您使用未类型化的对象并编写如下内容:

In contrast, if you work with untyped objects and write something like:

a = np.array([1,2,3]) # note no typedef
x = x[0]

建立索引的方式是:

__Pyx_GetItemInt(__pyx_v_a, 0, long, 1, __Pyx_PyInt_From_long, 0, 0, 1);

本身可以扩展为一大堆Python C-api调用(因此很慢).最终,它调用a__getitem__方法.

which itself expands to a whole bunch of Python C-api calls (so is slow). Ultimately it calls a's __getitem__ method.

与键入的numpy数组相比:确实没有太大的区别. 如果您执行以下操作:

Compared to typed numpy arrays: there really isn't a huge difference. If you do something like:

cdef np.ndarray[np.int32_t, ndim=1] new_arr

它实际上非常类似于memoryview,可以访问原始指针,并且速度应该非常相似.

it works practically very like a memoryview, with access to raw pointers and the speed should be very similar.

使用memoryviews的优点是您可以将它们与更广泛的数组类型一起使用(例如标准库数组),因此您可以更灵活地调用函数.这符合Python的鸭式输入"(Duck-typing)的一般想法-您的代码应使用行为正确的任何参数(而不是检查类型).

The advantage to using memoryviews is that you can use a wider range of array types with them (such as the standard library array), so you're more flexible about the types your functions can be called with. This fits in with the general Python idea of "duck-typing" - that your code should work with any parameter that behaves the right way (rather than checking the type).

第二个(小的)优点是,您不需要numpy标头来构建模块.

A second (small) advantage is that you don't need the numpy headers to build your module.

第三个(可能更大)的优点是,可以在没有GIL的情况下初始化内存视图,而cdef np.ndarray则不能(

A third (possibly larger) advantage is that memoryviews can be initialised without the GIL while cdef np.ndarrays can't (http://docs.cython.org/src/userguide/memoryviews.html#comparison-to-the-old-buffer-support)

内存视图的一个小缺点是它们的建立速度似乎较慢.

A slight disadvantage to memoryviews is that they seem to be slightly slower to set up.

与仅使用malloc ed int指针相比:

Compared to just using malloced int pointers:

您将不会获得任何速度优势(但是您也不会获得太多的速度损失).使用memoryview进行转换的次要优点是:

You won't get any speed advantage (but neither will you get too much speed loss). The minor advantages of converting using a memoryview are:

  1. 您可以编写可以从Python或在Cython内部使用的函数:

  1. You can write functions that can be used either from Python or internally within Cython:

cpdef do_something_useful(double[:] x):
    # can be called from Python with any array type or from Cython
    # with something that's already a memoryview
    ....

  • 您可以让Cython处理这种类型的阵列的内存释放,这可以简化生命周期未知的事情的寿命.请参见 http://docs.cython.org/src/userguide/memoryviews. html#cython-arrays ,尤其是.callback_free_data.

  • You can let Cython handle the freeing of memory for this type of array, which could simplify your life for things that have an unknown lifetime. See http://docs.cython.org/src/userguide/memoryviews.html#cython-arrays and especially .callback_free_data.

    您可以将数据传递回python python代码(它将获得基础的__pyx_memoryview_obj或类似名称).在这里要非常小心进行内存管理(即,请参阅第2点!).

    You can pass your data back to python python code (it'll get the underlying __pyx_memoryview_obj or something similar). Be very careful of memory management here (i.e. see point 2!).

    您可以做的另一件事是处理诸如2D数组之类的定义为指针的指针(例如double**).请参见 http://docs.cython.org/src/userguide/memoryviews.html#specifying-more-general-memory-layouts .我通常不喜欢这种类型的数组,但是如果您已经使用了现有的C代码,那么您可以与之交互(并将其传递回Python,以便您的Python代码也可以使用它).

    The other thing you can do is handle things like 2D arrays defined as pointer to pointer (e.g. double**). See http://docs.cython.org/src/userguide/memoryviews.html#specifying-more-general-memory-layouts. I generally don't like this type of array, but if you have existing C code that already uses if then you can interface with that (and pass it back to Python so your Python code can also use it).

    这篇关于Cython输入的memoryviews:它们实际上是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆