建议为类型化的内存视图分配内存的方法是什么? [英] What is the recommended way of allocating memory for a typed memory view?

查看:79
本文介绍了建议为类型化的内存视图分配内存的方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关类型化内存视图的Cython文档列出了三种分配给键入的内存视图:

  1. 通过原始C指针
  2. 来自np.ndarray
  3. 来自cython.view.array.

假设我没有数据从外部传递到cython函数,而是想分配内存并将其作为np.ndarray返回,我选择了哪些选项?还要假设该缓冲区的大小不是编译时常量,即我无法在堆栈上分配,但对于选项1,需要malloc.

因此,这3个选项看起来像这样:

from libc.stdlib cimport malloc, free
cimport numpy as np
from cython cimport view

np.import_array()

def memview_malloc(int N):
    cdef int * m = <int *>malloc(N * sizeof(int))
    cdef int[::1] b = <int[:N]>m
    free(<void *>m)

def memview_ndarray(int N):
    cdef int[::1] b = np.empty(N, dtype=np.int32)

def memview_cyarray(int N):
    cdef int[::1] b = view.array(shape=(N,), itemsize=sizeof(int), format="i")

令我惊讶的是,在所有三种情况下, A 简单基准测试并没有揭示这三种方法之间的巨大差异,其中2.是最快的.

推荐使用三种方法中的哪一种?还是有其他更好的选择?

后续问题:在函数中使用该内存视图之后,我想最终以np.ndarray的形式返回结果.类型化的内存视图是最佳选择,还是我宁愿只使用如下所示的旧缓冲区接口来首先创建ndarray?

cdef np.ndarray[DTYPE_t, ndim=1] b = np.empty(N, dtype=np.int32)

解决方案

查看这里寻求答案.

基本思想是您想要cpython.array.arraycpython.array.clone( cython.array.*):

from cpython.array cimport array, clone

# This type is what you want and can be cast to things of
# the "double[:]" syntax, so no problems there
cdef array[double] armv, templatemv

templatemv = array('d')

# This is fast
armv = clone(templatemv, L, False)

编辑

事实证明,该线程中的基准是垃圾.这是我的设定,还有我的时间安排:

# cython: language_level=3
# cython: boundscheck=False
# cython: wraparound=False

import time
import sys

from cpython.array cimport array, clone
from cython.view cimport array as cvarray
from libc.stdlib cimport malloc, free
import numpy as numpy
cimport numpy as numpy

cdef int loops

def timefunc(name):
    def timedecorator(f):
        cdef int L, i

        print("Running", name)
        for L in [1, 10, 100, 1000, 10000, 100000, 1000000]:
            start = time.clock()
            f(L)
            end = time.clock()
            print(format((end-start) / loops * 1e6, "2f"), end=" ")
            sys.stdout.flush()

        print("μs")
    return timedecorator

print()
print("INITIALISATIONS")
loops = 100000

@timefunc("cpython.array buffer")
def _(int L):
    cdef int i
    cdef array[double] arr, template = array('d')

    for i in range(loops):
        arr = clone(template, L, False)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("cpython.array memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr
    cdef array template = array('d')

    for i in range(loops):
        arr = clone(template, L, False)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("cpython.array raw C type")
def _(int L):
    cdef int i
    cdef array arr, template = array('d')

    for i in range(loops):
        arr = clone(template, L, False)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("numpy.empty_like memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr
    template = numpy.empty((L,), dtype='double')

    for i in range(loops):
        arr = numpy.empty_like(template)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("malloc")
def _(int L):
    cdef int i
    cdef double* arrptr

    for i in range(loops):
        arrptr = <double*> malloc(sizeof(double) * L)
        free(arrptr)

    # Prevents dead code elimination
    str(arrptr[0])

@timefunc("malloc memoryview")
def _(int L):
    cdef int i
    cdef double* arrptr
    cdef double[::1] arr

    for i in range(loops):
        arrptr = <double*> malloc(sizeof(double) * L)
        arr = <double[:L]>arrptr
        free(arrptr)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("cvarray memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr

    for i in range(loops):
        arr = cvarray((L,),sizeof(double),'d')

    # Prevents dead code elimination
    str(arr[0])



print()
print("ITERATING")
loops = 1000

@timefunc("cpython.array buffer")
def _(int L):
    cdef int i
    cdef array[double] arr = clone(array('d'), L, False)

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

@timefunc("cpython.array memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr = clone(array('d'), L, False)

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

@timefunc("cpython.array raw C type")
def _(int L):
    cdef int i
    cdef array arr = clone(array('d'), L, False)

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

@timefunc("numpy.empty_like memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr = numpy.empty((L,), dtype='double')

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

@timefunc("malloc")
def _(int L):
    cdef int i
    cdef double* arrptr = <double*> malloc(sizeof(double) * L)

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arrptr[i]

    free(arrptr)

    # Prevents dead-code elimination
    str(d)

@timefunc("malloc memoryview")
def _(int L):
    cdef int i
    cdef double* arrptr = <double*> malloc(sizeof(double) * L)
    cdef double[::1] arr = <double[:L]>arrptr

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    free(arrptr)

    # Prevents dead-code elimination
    str(d)

@timefunc("cvarray memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr = cvarray((L,),sizeof(double),'d')

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

输出:

INITIALISATIONS
Running cpython.array buffer
0.100040 0.097140 0.133110 0.121820 0.131630 0.108420 0.112160 μs
Running cpython.array memoryview
0.339480 0.333240 0.378790 0.445720 0.449800 0.414280 0.414060 μs
Running cpython.array raw C type
0.048270 0.049250 0.069770 0.074140 0.076300 0.060980 0.060270 μs
Running numpy.empty_like memoryview
1.006200 1.012160 1.128540 1.212350 1.250270 1.235710 1.241050 μs
Running malloc
0.021850 0.022430 0.037240 0.046260 0.039570 0.043690 0.030720 μs
Running malloc memoryview
1.640200 1.648000 1.681310 1.769610 1.755540 1.804950 1.758150 μs
Running cvarray memoryview
1.332330 1.353910 1.358160 1.481150 1.517690 1.485600 1.490790 μs

ITERATING
Running cpython.array buffer
0.010000 0.027000 0.091000 0.669000 6.314000 64.389000 635.171000 μs
Running cpython.array memoryview
0.013000 0.015000 0.058000 0.354000 3.186000 33.062000 338.300000 μs
Running cpython.array raw C type
0.014000 0.146000 0.979000 9.501000 94.160000 916.073000 9287.079000 μs
Running numpy.empty_like memoryview
0.042000 0.020000 0.057000 0.352000 3.193000 34.474000 333.089000 μs
Running malloc
0.002000 0.004000 0.064000 0.367000 3.599000 32.712000 323.858000 μs
Running malloc memoryview
0.019000 0.032000 0.070000 0.356000 3.194000 32.100000 327.929000 μs
Running cvarray memoryview
0.014000 0.026000 0.063000 0.351000 3.209000 32.013000 327.890000 μs

(之所以使用迭代"基准,是因为某些方法在这方面具有令人惊讶的不同特征.)

按照初始化速度的顺序:

malloc:这是一个严酷的世界,但是很快.如果您需要分配很多东西并且具有不受阻碍的迭代和索引性能,那就必须如此.但是通常来说,您是一个不错的选择……

cpython.array raw C type:该死,很快.而且很安全.不幸的是,它通过Python来访问其数据字段.您可以使用一个绝妙的技巧来避免这种情况:

arr.data.as_doubles[i]

将其提升到标准速度,同时消除了安全!这使得它成为malloc精彩替代品,基本上是一个引用计数很高的版本!

cpython.array buffer:进入时间仅为malloc的三到四倍,这似乎是一个不错的选择.不幸的是,它有很大的开销(尽管与boundscheckwraparound指令相比很小).这意味着它实际上只能与完全安全的变体竞争,但是 是初始化速度最快的变体.您的选择.

cpython.array memoryview:现在,它比malloc初始化要慢一个数量级.太可惜了,但是迭代的速度一样快.这是我建议的标准解决方案,除非打开boundscheckwraparound(在这种情况下cpython.array buffer可能是更引人注目的折衷方案).

其余的.唯一有价值的东西是numpy,这是因为对象附带了许多有趣的方法.就是这样.

The Cython documentation on typed memory views list three ways of assigning to a typed memory view:

  1. from a raw C pointer,
  2. from a np.ndarray and
  3. from a cython.view.array.

Assume that I don't have data passed in to my cython function from outside but instead want to allocate memory and return it as a np.ndarray, which of those options do I chose? Also assume that the size of that buffer is not a compile-time constant i.e. I can't allocate on the stack, but would need to malloc for option 1.

The 3 options would therefore looke something like this:

from libc.stdlib cimport malloc, free
cimport numpy as np
from cython cimport view

np.import_array()

def memview_malloc(int N):
    cdef int * m = <int *>malloc(N * sizeof(int))
    cdef int[::1] b = <int[:N]>m
    free(<void *>m)

def memview_ndarray(int N):
    cdef int[::1] b = np.empty(N, dtype=np.int32)

def memview_cyarray(int N):
    cdef int[::1] b = view.array(shape=(N,), itemsize=sizeof(int), format="i")

What is surprising to me is that in all three cases, Cython generates quite a lot of code for the memory allocation, in particular a call to __Pyx_PyObject_to_MemoryviewSlice_dc_int. This suggests (and I might be wrong here, my insight into the inner workings of Cython are very limited) that it first creates a Python object and then "casts" it into a memory view, which seems unnecessary overhead.

A simple benchmark doesn't reveal much difference between the three methods, with 2. being the fastest by a thin margin.

Which of the three methods is recommended? Or is there a different, better option?

Follow-up question: I want to finally return the result as a np.ndarray, after having worked with that memory view in the function. Is a typed memory view the best choice or would I rather just use the old buffer interface as below to create an ndarray in the first place?

cdef np.ndarray[DTYPE_t, ndim=1] b = np.empty(N, dtype=np.int32)

解决方案

Look here for an answer.

The basic idea is that you want cpython.array.array and cpython.array.clone (not cython.array.*):

from cpython.array cimport array, clone

# This type is what you want and can be cast to things of
# the "double[:]" syntax, so no problems there
cdef array[double] armv, templatemv

templatemv = array('d')

# This is fast
armv = clone(templatemv, L, False)

EDIT

It turns out that the benchmarks in that thread were rubbish. Here's my set, with my timings:

# cython: language_level=3
# cython: boundscheck=False
# cython: wraparound=False

import time
import sys

from cpython.array cimport array, clone
from cython.view cimport array as cvarray
from libc.stdlib cimport malloc, free
import numpy as numpy
cimport numpy as numpy

cdef int loops

def timefunc(name):
    def timedecorator(f):
        cdef int L, i

        print("Running", name)
        for L in [1, 10, 100, 1000, 10000, 100000, 1000000]:
            start = time.clock()
            f(L)
            end = time.clock()
            print(format((end-start) / loops * 1e6, "2f"), end=" ")
            sys.stdout.flush()

        print("μs")
    return timedecorator

print()
print("INITIALISATIONS")
loops = 100000

@timefunc("cpython.array buffer")
def _(int L):
    cdef int i
    cdef array[double] arr, template = array('d')

    for i in range(loops):
        arr = clone(template, L, False)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("cpython.array memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr
    cdef array template = array('d')

    for i in range(loops):
        arr = clone(template, L, False)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("cpython.array raw C type")
def _(int L):
    cdef int i
    cdef array arr, template = array('d')

    for i in range(loops):
        arr = clone(template, L, False)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("numpy.empty_like memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr
    template = numpy.empty((L,), dtype='double')

    for i in range(loops):
        arr = numpy.empty_like(template)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("malloc")
def _(int L):
    cdef int i
    cdef double* arrptr

    for i in range(loops):
        arrptr = <double*> malloc(sizeof(double) * L)
        free(arrptr)

    # Prevents dead code elimination
    str(arrptr[0])

@timefunc("malloc memoryview")
def _(int L):
    cdef int i
    cdef double* arrptr
    cdef double[::1] arr

    for i in range(loops):
        arrptr = <double*> malloc(sizeof(double) * L)
        arr = <double[:L]>arrptr
        free(arrptr)

    # Prevents dead code elimination
    str(arr[0])

@timefunc("cvarray memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr

    for i in range(loops):
        arr = cvarray((L,),sizeof(double),'d')

    # Prevents dead code elimination
    str(arr[0])



print()
print("ITERATING")
loops = 1000

@timefunc("cpython.array buffer")
def _(int L):
    cdef int i
    cdef array[double] arr = clone(array('d'), L, False)

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

@timefunc("cpython.array memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr = clone(array('d'), L, False)

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

@timefunc("cpython.array raw C type")
def _(int L):
    cdef int i
    cdef array arr = clone(array('d'), L, False)

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

@timefunc("numpy.empty_like memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr = numpy.empty((L,), dtype='double')

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

@timefunc("malloc")
def _(int L):
    cdef int i
    cdef double* arrptr = <double*> malloc(sizeof(double) * L)

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arrptr[i]

    free(arrptr)

    # Prevents dead-code elimination
    str(d)

@timefunc("malloc memoryview")
def _(int L):
    cdef int i
    cdef double* arrptr = <double*> malloc(sizeof(double) * L)
    cdef double[::1] arr = <double[:L]>arrptr

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    free(arrptr)

    # Prevents dead-code elimination
    str(d)

@timefunc("cvarray memoryview")
def _(int L):
    cdef int i
    cdef double[::1] arr = cvarray((L,),sizeof(double),'d')

    cdef double d
    for i in range(loops):
        for i in range(L):
            d = arr[i]

    # Prevents dead-code elimination
    str(d)

Output:

INITIALISATIONS
Running cpython.array buffer
0.100040 0.097140 0.133110 0.121820 0.131630 0.108420 0.112160 μs
Running cpython.array memoryview
0.339480 0.333240 0.378790 0.445720 0.449800 0.414280 0.414060 μs
Running cpython.array raw C type
0.048270 0.049250 0.069770 0.074140 0.076300 0.060980 0.060270 μs
Running numpy.empty_like memoryview
1.006200 1.012160 1.128540 1.212350 1.250270 1.235710 1.241050 μs
Running malloc
0.021850 0.022430 0.037240 0.046260 0.039570 0.043690 0.030720 μs
Running malloc memoryview
1.640200 1.648000 1.681310 1.769610 1.755540 1.804950 1.758150 μs
Running cvarray memoryview
1.332330 1.353910 1.358160 1.481150 1.517690 1.485600 1.490790 μs

ITERATING
Running cpython.array buffer
0.010000 0.027000 0.091000 0.669000 6.314000 64.389000 635.171000 μs
Running cpython.array memoryview
0.013000 0.015000 0.058000 0.354000 3.186000 33.062000 338.300000 μs
Running cpython.array raw C type
0.014000 0.146000 0.979000 9.501000 94.160000 916.073000 9287.079000 μs
Running numpy.empty_like memoryview
0.042000 0.020000 0.057000 0.352000 3.193000 34.474000 333.089000 μs
Running malloc
0.002000 0.004000 0.064000 0.367000 3.599000 32.712000 323.858000 μs
Running malloc memoryview
0.019000 0.032000 0.070000 0.356000 3.194000 32.100000 327.929000 μs
Running cvarray memoryview
0.014000 0.026000 0.063000 0.351000 3.209000 32.013000 327.890000 μs

(The reason for the "iterations" benchmark is that some methods have surprisingly different characteristics in this respect.)

In order of initialisation speed:

malloc: This is a harsh world, but it's fast. If you need to to allocate a lot of things and have unhindered iteration and indexing performance, this has to be it. But normally you're a good bet for...

cpython.array raw C type: Well damn, it's fast. And it's safe. Unfortunately it goes through Python to access its data fields. You can avoid that by using a wonderful trick:

arr.data.as_doubles[i]

which brings it up to the standard speed while removing safety! This makes this a wonderful replacement for malloc, being basically a pretty reference-counted version!

cpython.array buffer: Coming in at only three to four times the setup time of malloc, this is looks a wonderful bet. Unfortunately it has significant overhead (albeit small compared to the boundscheck and wraparound directives). That means it only really competes against full-safety variants, but it is the fastest of those to initialise. Your choice.

cpython.array memoryview: This is now an order of magnitude slower than malloc to initialise. That's a shame, but it iterates just as fast. This is the standard solution that I would suggest unless boundscheck or wraparound are on (in which case cpython.array buffer might be a more compelling tradeoff).

The rest. The only one worth anything is numpy's, due to the many fun methods attached to the objects. That's it, though.

这篇关于建议为类型化的内存视图分配内存的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆