为什么比CFFI这么numpy的更快? [英] Why is cffi so much quicker than numpy?

查看:234
本文介绍了为什么比CFFI这么numpy的更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在Python写CFFI模块玩耍,他们的速度让我不知道如果我正确使用标准的蟒蛇。它让我想完全切换到C!说实话有一些伟大的Python库我永远无法重新实现自己用C,所以这是比任何东西真多之上。

此示例显示了一个numpy的阵列中使用Python中求和函数,以及如何缓慢它与C函数比较。有没有计算numpy的阵列的总和更快的Python的方式?

 高清cast_matrix(矩阵,FFI):
    AP = ffi.new(双* [%d个]%(matrix.shape [0]))
    PTR = ffi.cast(双*,matrix.ctypes.data)
    因为我在范围内(matrix.shape [0]):
        AP [i] = PTR + I * matrix.shape [1]
    回APFFI = FFI()
ffi.cdef(
双总和(双**,INT,INT);

C = ffi.verify(
双总和(双**矩阵,诠释的x,int y)对{
    INT I,J;
    双总和= 0.0;
    对于(i = 0; I< X,我++){
        为(J = 0; J< Y; ​​J ++){
            总和=总和+矩阵[I] [J]。
        }
    }
    返回(总和);
}

M = np.ones(形状=(10,10))
打印numpy的说,m.sum()M_P = cast_matrix(男,FFI)SM = C.sum(M_P,m.shape [0],m.shape [1])
打印CFFI说,SM

只是为了显示功能如下:

  numpy的说:100.0
CFFI说100.0

现在,如果我时间这个简单的功能,我发现numpy的实在是太慢了!
我使用numpy的正确方式?有没有计算在python的总和更快的方法?

 导入时间
ñ= 1000000T0 =​​选定了time.time()
因为我在范围(N):C.sum(M_P,m.shape [0],m.shape [1])
T1 =选定了time.time()打印CFFI',T1-T0T0 =​​选定了time.time()
因为我在范围(N):m.sum()
T1 =选定了time.time()打印numpy的',T1-T0

时间:

  CFFI 0.818415880203
numpy的5.61657714844


解决方案

numpy的是比C更慢的原因有两个:Python的开销(也许类似于CFFI)和通用性。 numpy的旨在处理任意尺寸的阵列,在一堆不同的数据类型。您与CFFI例子是为花车的二维数组的。成本在写code的几行VS 的.sum(),6个字符,以节省低于5微秒。 (不过,当然,你已经知道这一点)。我只是想强调的是,CPU时间很便宜,比开发时间便宜多了。

现在,如果你想坚持到numpy的,和你想获得更好的性能,最好的选择是使用瓶颈。他们为float和双打的1和二维数组优化的一些功能,它们速度极快。在你的情况,更快的16倍,这将会把执行时间0.35,或约快两倍CFFI。

有关这个瓶颈不具备其他功能,你可以使用用Cython。它可以帮助你写C code具有更Python语法。或者,如果你愿意,逐步转换成Python的C,直到你满意的速度。

I have been playing around with writing cffi modules in python, and their speed is making me wonder if I'm using standard python correctly. It's making me want to switch to C completely! Truthfully there are some great python libraries I could never reimplement myself in C so this is more hypothetical than anything really.

This example shows the sum function in python being used with a numpy array, and how slow it is in comparison with a c function. Is there a quicker pythonic way of computing the sum of a numpy array?

def cast_matrix(matrix, ffi):
    ap = ffi.new("double* [%d]" % (matrix.shape[0]))
    ptr = ffi.cast("double *", matrix.ctypes.data)
    for i in range(matrix.shape[0]):
        ap[i] = ptr + i*matrix.shape[1]                                                                
    return ap 

ffi = FFI()
ffi.cdef("""
double sum(double**, int, int);
""")
C = ffi.verify("""
double sum(double** matrix,int x, int y){
    int i, j; 
    double sum = 0.0;
    for (i=0; i<x; i++){
        for (j=0; j<y; j++){
            sum = sum + matrix[i][j];
        }
    }
    return(sum);
}
""")
m = np.ones(shape=(10,10))
print 'numpy says', m.sum()

m_p = cast_matrix(m, ffi)

sm = C.sum(m_p, m.shape[0], m.shape[1])
print 'cffi says', sm

just to show the function works:

numpy says 100.0
cffi says 100.0

now if I time this simple function I find that numpy is really slow! Am I using numpy in the correct way? Is there a faster way to calculate the sum in python?

import time
n = 1000000

t0 = time.time()
for i in range(n): C.sum(m_p, m.shape[0], m.shape[1])
t1 = time.time()

print 'cffi', t1-t0

t0 = time.time()
for i in range(n): m.sum()
t1 = time.time()

print 'numpy', t1-t0

times:

cffi 0.818415880203
numpy 5.61657714844

解决方案

Numpy is slower than C for two reasons: the Python overhead (probably similar to cffi) and generality. Numpy is designed to deal with arrays of arbitrary dimensions, in a bunch of different data types. Your example with cffi was made for a 2D array of floats. The cost was writing several lines of code vs .sum(), 6 characters to save less than 5 microseconds. (But of course, you already knew this). I just want to emphasize that CPU time is cheap, much cheaper than developer time.

Now, if you want to stick to Numpy, and you want to get a better performance, your best option is to use Bottleneck. They provide a few functions optimised for 1 and 2D arrays of float and doubles, and they are blazing fast. In your case, 16 times faster, which will put execution time in 0.35, or about twice as fast as cffi.

For other functions that bottleneck does not have, you can use Cython. It helps you write C code with a more pythonic syntax. Or, if you will, convert progressively Python into C until you are happy with the speed.

这篇关于为什么比CFFI这么numpy的更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆