数组c VS numpy的数组 [英] C array vs NumPy array

查看:374
本文介绍了数组c VS numpy的数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在性能方面(代数运算,查询,缓存等),是否有 C数组之间的差异(可公开为C数组,或 cython.view.array [用Cython数组],或上述两者的memoryview)和 numpy的阵列(在用Cython应该没有Python的开销)

In terms of performance (algebraic operations, lookup, caching, etc.), is there a difference between C arrays (which can be exposed as a C array, or a cython.view.array [Cython array], or a memoryview of the aforementioned two) and a NumPy arrays (which in Cython should have no Python overhead)

编辑:

我要提的是,numpy的阵列中使用用Cython是静态类型,而 DTYPE 是numpy的编译时datypes(如 CDEF NP .int_t CDEF np.float32_t ),并在C情况下,类型是C当量( CDEF int_t CDEF浮动

I should mention that in the NumPy array is statically typed using Cython, and the dtypes are NumPy compile-time datypes (e.g. cdef np.int_t or cdef np.float32_t), and the types in the C case are the C equivalents (cdef int_t and cdef float)

EDIT2:

下面是从用Cython Memoryview文档例子来进一步说明我的问题:

Here is the example from the Cython Memoryview documentation to further illustrate my question:

from cython.view cimport array as cvarray
import numpy as np

# Memoryview on a NumPy array
narr = np.arange(27, dtype=np.dtype("i")).reshape((3, 3, 3))
cdef int [:, :, :] narr_view = narr

# Memoryview on a C array
cdef int carr[3][3][3]
cdef int [:, :, :] carr_view = carr

# Memoryview on a Cython array
cyarr = cvarray(shape=(3, 3, 3), itemsize=sizeof(int), format="i")
cdef int [:, :, :] cyarr_view = cyarr

有一个数组c VS一个用Cython数组 VS一个<$坚持有什么区别C $ C> numpy的阵列?

Is there any difference between sticking with a C array vs a Cython array vs a NumPy array?

推荐答案

我在这方面的知识还很不完善,但是这可能会有所帮助。
我进行了一些非正式的基准测试显示了每个数组类型是好的,被我发现了什么兴趣。

My knowledge on this is still imperfect, but this may be helpful. I ran some informal benchmarks to show what each array type is good for and was intrigued by what I found.

虽然这些数组类型在许多方面有所不同,如果你正在做的,大型排列大量的计算,你应该​​能够得到类似的表现出来的任何人,因为项目按项目的访问应大致跨越同板。

Though these array types are different in many ways, if you are doing heavy computation with large arrays, you should be able to get similar performance out of any of them since item-by-item access should be roughly the same across the board.

一个numpy的阵列是使用Python的C API Python对象实现的。
numpy的阵列确实提供了在C层次的API,但它们不能从Python间preTER创建独立的。
他们是因为所有不同的数组操作提供numpy的和SciPy的例程尤其有用。

A NumPy array is a Python object implemented using Python's C API. NumPy arrays do provide an API at the C level, but they cannot be created independent from the Python interpreter. They are especially useful because of all the different array manipulation routines available in NumPy and SciPy.

一个用Cython的内存视图也是一个Python对象,但它是由一个用Cython扩展类型。
它似乎并没有被设计用于纯Python的使用,因为它不是用Cython,可以直接从Python中导入的一部分,但你可以从用Cython函数返回一个视图到Python。
你可以看一下执行情况的https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView.pyx

A Cython memory view is also a Python object, but it is made as a Cython extension type. It does not appear to be designed for use in pure Python since it isn't a part of Cython that can be imported directly from Python, but you can return a view to Python from a Cython function. You can look at the implementation at https://github.com/cython/cython/blob/master/Cython/Utility/MemoryView.pyx

C数组是C语言的本机类型。
它被编入索引像一个指针,而是数组和指针是不同的。
有一个在 http://c-faq.com/aryptr/index.html 这方面的一些很好的讨论
它们可以在栈上分配和用于C编译器优化更容易,但他们会更难以进入用Cython之外。
我知道你可以从已被其他程序被动态分配的内存阵列numpy的,但似乎更加困难的方式。
特拉维斯·奥利芬特贴出这样的例子在<一个href=\"http://blog.enthought.com/python/numpy-arrays-with-$p$p-allocated-memory/\">http://blog.enthought.com/python/numpy-arrays-with-$p$p-allocated-memory/
如果你是使用C数组或者指针为你的程序中临时存储他们应该很好地工作适合你。
他们不会为切片或任何其他种类量化计算的方便,因为你必须自己做的一切有明确的循环,但他们应该分配和释放速度更快,应该速度提供了良好的基础。

A C array is a native type in the C language. It is indexed like a pointer, but arrays and pointers are different. There is some good discussion on this at http://c-faq.com/aryptr/index.html They can be allocated on the stack and are easier for the C compiler to optimize, but they will be more difficult to access outside of Cython. I know you can make a NumPy array from memory that has been dynamically allocated by other programs, but it seems a lot more difficult that way. Travis Oliphant posted an example of this at http://blog.enthought.com/python/numpy-arrays-with-pre-allocated-memory/ If you are using C arrays or pointers for temporary storage within your program they should work very well for you. They will not be as convenient for slicing or for any other sort of vectorized computation since you will have to do everything yourself with explicit looping, but they should allocate and deallocate faster and ought to provide a good baseline for speed.

用Cython还提供了一个数组类。
它看起来就像是专供内部使用。
当memoryview复制创建实例。
见<一href=\"http://docs.cython.org/src/userguide/memoryviews.html#view-cython-arrays\">http://docs.cython.org/src/userguide/memoryviews.html#view-cython-arrays

Cython also provides an array class. It looks like it is designed for internal use. Instances are created when a memoryview is copied. See http://docs.cython.org/src/userguide/memoryviews.html#view-cython-arrays

在用Cython,还可以分配内存和索引的指针来对待分配的内存有点像一个数组。
见<一href=\"http://docs.cython.org/src/tutorial/memory_allocation.html\">http://docs.cython.org/src/tutorial/memory_allocation.html

In Cython, you can also allocate memory and index a pointer to treat the allocated memory somewhat like an array. See http://docs.cython.org/src/tutorial/memory_allocation.html

下面是一些基准,用于显示索引大型阵列几分相似的性能。
这是用Cython文件

Here are some benchmarks that show somewhat similar performance for indexing large arrays. This is the Cython file.

from numpy cimport ndarray as ar, uint64_t
cimport cython
import numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
def ndarr_time(uint64_t n=1000000, uint64_t size=10000):
    cdef:
        ar[uint64_t] A = np.empty(n, dtype=np.uint64)
        uint64_t i, j
    for i in range(n):
        for j in range(size):
            A[j] = n

def carr_time(uint64_t n=1000000):
    cdef:
        ar[uint64_t] A = np.empty(n, dtype=np.uint64)
        uint64_t AC[10000]
        uint64_t a
        int i, j
    for i in range(n):
        for j in range(10000):
            AC[j] = n

@cython.boundscheck(False)
@cython.wraparound(False)
def ptr_time(uint64_t n=1000000, uint64_t size=10000):
    cdef:
        ar[uint64_t] A = np.empty(n, dtype=np.uint64)
        uint64_t* AP = &A[0]
        uint64_t a
        int i, j
    for i in range(n):
        for j in range(size):
            AP[j] = n

@cython.boundscheck(False)
@cython.wraparound(False)
def view_time(uint64_t n=1000000, uint64_t size=10000):
    cdef:
        ar[uint64_t] A = np.empty(n, dtype=np.uint64)
        uint64_t[:] AV = A
        uint64_t i, j
    for i in range(n):
        for j in range(size):
            AV[j] = n

这些计时使用IPython的,我们得到

Timing these using IPython we obtain

%timeit -n 10 ndarr_time()
%timeit -n 10 carr_time()
%timeit -n 10 ptr_time()
%timeit -n 10 view_time()

10 loops, best of 3: 6.33 s per loop
10 loops, best of 3: 3.12 s per loop
10 loops, best of 3: 6.26 s per loop
10 loops, best of 3: 3.74 s per loop

这些结果使我感到有点奇怪,考虑到,按照效率:数组指针VS ,阵列是不太可能比指针显著更快。
看来,一些编译器优化是使得纯C数组和类型化的内存意见更快。
我试图在我的C语言编译器关闭所有的优化参数,得到了时序

These results struck me as a little odd, considering that, as per Efficiency: arrays vs pointers , arrays are unlikely to be significantly faster than pointers. It appears that some sort of compiler optimization is making the pure C arrays and the typed memory views faster. I tried turning off all the optimization flags on my C compiler and got the timings

1 loops, best of 3: 25.1 s per loop
1 loops, best of 3: 25.5 s per loop
1 loops, best of 3: 32 s per loop
1 loops, best of 3: 28.4 s per loop

在我看来像项目,按项目接入$​​ P $ ptty多一刀切相同,不同之处在于C数组,并用Cython内存的观点似乎更容易让编译器优化。

It looks to me like the item-by item access is pretty much the same across the board, except that C arrays and Cython memory views seem to be easier for the compiler to optimize.

在此更多的评论可以在这两个博客文章,我发现前一段时间可以看出:
<一href=\"http://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/\">http://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/

<一href=\"http://jakevdp.github.io/blog/2012/08/16/memoryview-benchmarks-2/\">http://jakevdp.github.io/blog/2012/08/16/memoryview-benchmarks-2/

More commentary on this can be seen at a these two blog posts I found some time ago: http://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/ http://jakevdp.github.io/blog/2012/08/16/memoryview-benchmarks-2/

在第二个博客文章,他对如何,如果内存视图切片内联,他们可以提供类似于指针运算速度评论。
我在我自己的一些测试,使用的内存查看切片明确联函数并不总是必要的注意。
作为这样的一个例子中,我将计算阵列的两行的每个组合的内积

In the second blog post he comments on how, if memory view slices are inlined, they can provide speeds similar to that of pointer arithmetic. I have noticed in some of my own tests that explicitly inlining functions that use Memory View slices isn't always necessary. As an example of this, I'll compute the inner product of every combination of two rows of an array.

from numpy cimport ndarray as ar
cimport cython
from numpy import empty

# An inlined dot product
@cython.boundscheck(False)
@cython.wraparound(False)
cdef inline double dot_product(double[:] a, double[:] b, int size):
    cdef int i
    cdef double tot = 0.
    for i in range(size):
        tot += a[i] * b[i]
    return tot

# non-inlined dot-product
@cython.boundscheck(False)
@cython.wraparound(False)
cdef double dot_product_no_inline(double[:] a, double[:] b, int size):
    cdef int i
    cdef double tot = 0.
    for i in range(size):
        tot += a[i] * b[i]
    return tot

# function calling inlined dot product
@cython.boundscheck(False)
@cython.wraparound(False)
def dot_rows_slicing(ar[double,ndim=2] A):
    cdef:
        double[:,:] Aview = A
        ar[double,ndim=2] res = empty((A.shape[0], A.shape[0]))
        int i, j
    for i in range(A.shape[0]):
        for j in range(A.shape[0]):
            res[i,j] = dot_product(Aview[i], Aview[j], A.shape[1])
    return res

# function calling non-inlined version
@cython.boundscheck(False)
@cython.wraparound(False)
def dot_rows_slicing_no_inline(ar[double,ndim=2] A):
    cdef:
        double[:,:] Aview = A
        ar[double,ndim=2] res = empty((A.shape[0], A.shape[0]))
        int i, j
    for i in range(A.shape[0]):
        for j in range(A.shape[0]):
            res[i,j] = dot_product_no_inline(Aview[i], Aview[j], A.shape[1])
    return res

# inlined dot product using numpy arrays
@cython.boundscheck(False)
@cython.boundscheck(False)
cdef inline double ndarr_dot_product(ar[double] a, ar[double] b):
    cdef int i
    cdef double tot = 0.
    for i in range(a.size):
        tot += a[i] * b[i]
    return tot

# non-inlined dot product using numpy arrays
@cython.boundscheck(False)
@cython.boundscheck(False)
cdef double ndarr_dot_product_no_inline(ar[double] a, ar[double] b):
    cdef int i
    cdef double tot = 0.
    for i in range(a.size):
        tot += a[i] * b[i]
    return tot

# function calling inlined numpy array dot product
@cython.boundscheck(False)
@cython.wraparound(False)
def ndarr_dot_rows_slicing(ar[double,ndim=2] A):
    cdef:
        ar[double,ndim=2] res = empty((A.shape[0], A.shape[0]))
        int i, j
    for i in range(A.shape[0]):
        for j in range(A.shape[0]):
            res[i,j] = ndarr_dot_product(A[i], A[j])
    return res

# function calling nun-inlined version for numpy arrays
@cython.boundscheck(False)
@cython.wraparound(False)
def ndarr_dot_rows_slicing_no_inline(ar[double,ndim=2] A):
    cdef:
        ar[double,ndim=2] res = empty((A.shape[0], A.shape[0]))
        int i, j
    for i in range(A.shape[0]):
        for j in range(A.shape[0]):
            res[i,j] = ndarr_dot_product(A[i], A[j])
    return res

# Version with explicit looping and item-by-item access.
@cython.boundscheck(False)
@cython.wraparound(False)
def dot_rows_loops(ar[double,ndim=2] A):
    cdef:
        ar[double,ndim=2] res = empty((A.shape[0], A.shape[0]))
        int i, j, k
        double tot
    for i in range(A.shape[0]):
        for j in range(A.shape[0]):
            tot = 0.
            for k in range(A.shape[1]):
                tot += A[i,k] * A[j,k]
            res[i,j] = tot
    return res

定时这些我们看到

Timing these we see

A = rand(1000, 1000)
%timeit dot_rows_slicing(A)
%timeit dot_rows_slicing_no_inline(A)
%timeit ndarr_dot_rows_slicing(A)
%timeit ndarr_dot_rows_slicing_no_inline(A)
%timeit dot_rows_loops(A)

1 loops, best of 3: 1.02 s per loop
1 loops, best of 3: 1.02 s per loop
1 loops, best of 3: 3.65 s per loop
1 loops, best of 3: 3.66 s per loop
1 loops, best of 3: 1.04 s per loop

结果,因为他们没有它用显式内联一样快。
在两种情况下,输入的存储器访问量是相当的,以一个版本未经切片写入的功能的

The results were as fast with explicit inlining as they were without it. In both cases, the typed memory views were comparable to a version of the function that was written without slicing.

在博客中,他写了一个具体的例子来的的编译器不内联函数。
看来,一个体面的C编译器(我使用MinGW的)能够照顾这些优化而不被告知要内联的某些功能。
Memoryviews可以传递阵列切片功能之间的用Cython模块中,即使没有明确的内联快。

In the blog post, he had to write a specific example to force the compiler to not inline a function. It appears that a decent C compiler (I'm using MinGW) is able to take care of these optimizations without being told to inline certain functions. Memoryviews can be faster for passing array slices between functions within a Cython module, even without explicit inlining.

在这种特殊情况下,然而,即使推动循环到C并不能真正达到高速附近的任何地方有什么可以通过正确使用矩阵乘法来实现。
该BLAS仍然是做这样的事情的最好办法。

In this particular case, however, even pushing the loops to C doesn't really reach a speed anywhere near what can be achieved through proper use of matrix multiplication. The BLAS is still the best way to do things like this.

%timeit A.dot(A.T)
10 loops, best of 3: 25.7 ms per loop

还有一个从numpy的阵列自动转换为memoryviews在

There is also automatic conversion from NumPy arrays to memoryviews as in

cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def cysum(double[:] A):
    cdef tot = 0.
    cdef int i
    for i in range(A.size):
        tot += A[i]
    return tot

在一个陷阱是,如果你想有一个函数返回一个numpy的数组,你将不得不使用 np.asarray 的内存视图对象转换为numpy的再次数组。
这是一个相对便宜的操作,因为内存的观点符合<一个href=\"http://legacy.python.org/dev/peps/pep-3118/\">http://legacy.python.org/dev/peps/pep-3118/

The one catch is that, if you want a function to return a NumPy array, you will have to use np.asarray to convert the memory view object to a NumPy array again. This is a relatively inexpensive operation since memory views comply with http://legacy.python.org/dev/peps/pep-3118/

键入的内存的观点似乎是一个可行的替代numpy的阵列为用Cython模块中内部使用。
数组切片会更快内存的观点,但也有没有那么多的函数和方法对内存的观点写成有用于numpy的数组。
如果你不需要调用了一堆的numpy的阵列方法和要容易数组切片,可以代替numpy的阵列的使用内存意见。
如果你既需要数组分片的的一个给定的数组numpy的功能,你可以指向相同的内存作为numpy的阵列的存储视图。
然后,您可以使用该视图传递函数和阵列之间切片调用numpy的功能。
这种方法仍然是有限的,但它会工作好,如果你正在做的大部分的处理与单个阵列。

Typed memory views seem to be a viable alternative to NumPy arrays for internal use in a Cython module. Array slicing will be faster with memory views, but there are not as many functions and methods written for memory views as there are for NumPy arrays. If you don't need to call a bunch of the NumPy array methods and want easy array slicing, you can use memory views in place of NumPy arrays. If you need both the array slicing and the NumPy functionality for a given array, you can make a memory view that points to the same memory as the NumPy array. You can then use the view for passing slices between functions and the array for calling NumPy functions. That approach is still somewhat limited, but it will work well if you are doing most of your processing with a single array.

C数组和/或内存中动态分配的内存块可能是中间计算是有用的,但他们不是那么容易回传给Python的使用存在。
在我看来,这也是比较麻烦动态分配多维C数组。
我所知道的最好的办法是分配的内存大块,然后用整数运算来索引​​它,就好像它是一个多维数组。
如果希望在飞行阵列容易分配,这可能是一个问题。
在另一方面,分配时间可能是一个好一点的更快C数组。
其他数组类型的设计是几乎一样快速,方便多了,所以我会建议使用它们,除非有令人信服的理由不这样做。

C arrays and/or dynamically allocated blocks of memory could be useful for intermediate calculations, but they are not as easy to pass back to Python for use there. In my opinion, it is also more cumbersome to dynamically allocate multidimensional C arrays. The best approach I am aware of is to allocate a large block of memory and then use integer arithmetic to index it as if it were a multidimensional array. This could be an issue if you want easy allocation of arrays on the fly. On the other hand, allocation times are probably a good bit faster for C arrays. The other array types are designed to be nearly as fast and much more convenient, so I would recommend using them unless there is a compelling reason to do otherwise.

更新:作为由@Veedrac答案提到你还可以通过用Cython的内存视图来最numpy的功能。
当你这样做,numpy的通常必须创建一个新的numpy的数组对象反正与内存视图的工作,所以这将是有点慢。
对于大型阵列效果将是微不足道的。
np.asarray 一电一内存视图会比较快,无论数组的大小。
然而,为了证明这一点的效果,这里是另一个风向标:

Update: As mentioned in the answer by @Veedrac you can still pass Cython memory views to most NumPy functions. When you do this, NumPy will usually have to create a new NumPy array object to work with the memory view anyway, so this will be somewhat slower. For large arrays the effect will be negligible. A call to np.asarray for a memory view will be relatively fast regardless of array size. However, to demonstrate this effect, here is another benchmark:

用Cython文件:

def npy_call_on_view(npy_func, double[:] A, int n):
    cdef int i
    for i in range(n):
        npy_func(A)

def npy_call_on_arr(npy_func, ar[double] A, int n):
    cdef int i
    for i in range(n):
        npy_func(A)

在IPython的:

from numpy.random import rand
A = rand(1)
%timeit npy_call_on_view(np.amin, A, 10000)
%timeit npy_call_on_arr(np.amin, A, 10000)

输出:

10 loops, best of 3: 282 ms per loop
10 loops, best of 3: 35.9 ms per loop

我试图选择,将显示这种效果很好的例子。
除非许多numpy的函数调用相对较小的阵列都参与其中,这是不应该改变的时候了一大堆。
请记住,无论我们呼吁numpy的哪种方式是,一个Python函数调用仍会发生。

I tried to choose an example that would show this effect well. Unless many NumPy function calls on relatively small arrays are involved, this shouldn't change the time a whole lot. Keep in mind that, regardless of which way we are calling NumPy, a Python function call still occurs.

这仅适用于numpy的功能。
大多数阵列方法不适用于memoryviews(某些属性仍然是,如尺寸形状 T )。
例如 A.dot(AT)与numpy的阵列将成为 np.dot(A,AT)

This applies only to the functions in NumPy. Most of the array methods are not available for memoryviews (some of the attributes still are, like size and shape and T). For example A.dot(A.T) with NumPy arrays would become np.dot(A, A.T).

这篇关于数组c VS numpy的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆