使用SciPy接口和Cython直接调用BLAS/LAPACK [英] Calling BLAS / LAPACK directly using the SciPy interface and Cython

查看:144
本文介绍了使用SciPy接口和Cython直接调用BLAS/LAPACK的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里有一篇文章: https://gist.github.com/JonathanRaiman/f2ce5331750da7b2d4e9 仅通过调用Fortran库(BLAS/LAPACK/Intel MKL/OpenBLAS/随NumPy一起安装的任何文件)即可显着提高速度.经过数小时的研究(由于不建议使用的SciPy库),我终于得到了编译,但没有结果.它比NumPy快2倍.不幸的是,正如另一个用户指出的那样,Fortran例程始终将输出矩阵添加到计算出的新结果中,因此它仅在第一次运行时与NumPy相匹配. IE. A := alpha*x*y.T + A.因此,这仍然有待快速解决.

There was a post on this here: https://gist.github.com/JonathanRaiman/f2ce5331750da7b2d4e9 which shows a great speed improvement by just calling the Fortran libraries (BLAS / LAPACK / Intel MKL / OpenBLAS / whatever you installed with NumPy). After many hours of working on this (because of deprecated SciPy libraries) I finally got it to compile with no results. It was 2x faster than NumPy. Unfortunately as another user pointed out, the Fortran routine is always adding the output matrix to the new results calculated, so it only matches NumPy on the 1st run. I.e. A := alpha*x*y.T + A. So that remains to be solved with a fast solution.

[更新:对于那些想要使用SCIPY接口的人来说,请直接在这里 https://github.com/scipy/scipy/blob/master/scipy/linalg/cython_blas.pyx ,因为它们已经将CPDEF语句中的BLAS/LAPACK呼叫优化/粘贴到您的CYTHON脚本中# Python-accessible wrappers for testing:也在cython_lapack.pyx上方的链接上可用,但没有Cython测试脚本]

[UPDATE: FOR THOSE OF YOU LOOKING TO USE THE SCIPY INTERFACE JUST GO HERE https://github.com/scipy/scipy/blob/master/scipy/linalg/cython_blas.pyx AS THEY ALREADY HAVE OPTIMIZED THE CALLS TO BLAS/LAPACK IN THE CPDEF STATEMENTS, JUST COPY / PASTE INTO YOUR CYTHON SCRIPT # Python-accessible wrappers for testing: Also at the link above cython_lapack.pyx is available but doesn't have the Cython test scripts]

import numpy as np;
from cyblas import outer_prod;
a=np.random.randint(0,100, 1000);
b=np.random.randint(0,100, 1000);
a=a.astype(np.float64)
b=b.astype(np.float64)
cy_outer=np.zeros((a.shape[0],b.shape[0]));
np_outer=np.zeros((a.shape[0],b.shape[0]));

%timeit outer_prod(a,b,cy_outer)
#%timeit outer_prod(a,b) #use with fixed version instead of above line, results will automatically update cy_outer
%timeit np.outer(a,b, np_outer)
100 loops, best of 3: 2.83 ms per loop
100 loops, best of 3: 6.58 ms per loop

#结束测试脚本

import cython
import numpy as np
cimport numpy as np

from cpython cimport PyCapsule_GetPointer 
cimport scipy.linalg.cython_blas
cimport scipy.linalg.cython_lapack
import scipy.linalg as LA

REAL = np.float64
ctypedef np.float64_t REAL_t
ctypedef np.uint64_t  INT_t

cdef int ONE = 1
cdef REAL_t ONEF = <REAL_t>1.0

ctypedef void (*dger_ptr) (const int *M, const int *N, const double *alpha, const double *X, const int *incX, double *Y, const int *incY, double *A, const int * LDA) nogil
cdef dger_ptr dger=<dger_ptr>PyCapsule_GetPointer(LA.blas.dger._cpointer, NULL)  # A := alpha*x*y.T + A

cpdef outer_prod(_x, _y, _output):
#cpdef outer_prod(_x, _y): #comment above line & use this to use the reset output matrix to zeros
    cdef REAL_t *x = <REAL_t *>(np.PyArray_DATA(_x))
    cdef int M = _y.shape[0]
    cdef int N = _x.shape[0]
    #cdef np.ndarray[np.float64_t, ndim=2, order='c'] _output = np.zeros((M,N)) #slow fix to uncomment to reset output matrix to zeros
    cdef REAL_t *y = <REAL_t *>(np.PyArray_DATA(_y))
    cdef REAL_t *output = <REAL_t *>(np.PyArray_DATA(_output))
    with nogil:
        dger(&M, &N, &ONEF, y, &ONE, x, &ONE, output, &M)

非常感谢.希望这可以为其他人节省一些时间(ALMOST可以工作)-实际上,正如我所说的,它可以工作1倍并匹配NumPy,然后每个后续调用都将再次添加到结果矩阵中.如果我将输出矩阵重置为0并重新运行结果,则匹配NumPy.奇怪...尽管如果取消注释,上面的几行内容也只能在NumPy速度下运行.已找到替代方法memset,并将在另一篇文章中……我只是还没有弄清楚如何调用它.

Much appreciated. Hope this saves other people some time (it ALMOST works) - actually as I commented it works 1x and matches NumPy then every subsequent call adds to the result matrix AGAIN. If I reset the output matrix to 0 and rerun the results then match NumPy. Strange... although if one uncomments the few lines above it will work although only at NumPy speeds. The alternative has been found memset and will be in another post... I just haven't figured out exactly how to call it yet.

推荐答案

根据

According to netlib dger(M, N, ALPHA, X INCX, Y, INCY, A, LDA) performs A := alpha*x*y**T + A. So A should be all zeros to get the outer product of X and Y.

这篇关于使用SciPy接口和Cython直接调用BLAS/LAPACK的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆