在cython中高效的矩阵运算,没有python对象 [英] Efficient matrix operations in cython with no python objects

查看:72
本文介绍了在cython中高效的矩阵运算,没有python对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个函数,该函数希望使用Cython尽可能多地转换为C。为此,我需要使用线性代数运算。这是我的职能。 编辑:我学到的教训是尝试在循环之外处理线性代数,这在很大程度上是我能够做到的。否则,请包装LAPACK / BLAS或编写我自己的函数。

 将numpy导入为np 
从scipy.stats导入multivariate_normal为mv
导入itertools

def llf(data,rho,mu,sigma,A,V,n):
'''通过guass-hermite正交

参数
----------
数据:数组
N x J矩阵,列为测量值
rho:数组
长度L表示法线混合的权重向量
mu:数组
L x K表示法线混合均值的向量
sigma:数组
K x法线混合的方差矩阵的L * K方差矩阵
A:数组
J x(K + 1)载荷矩阵
V:数组
J x J测量方差矩阵错误
n:int
正交
的采样点数'''

N = data.shape [0]
L,K = mu。形状

#变胖s和近似积分
v的采样点,w = np.polynomial.hermite.hermgauss(n)

totllf = 0
对于范围(N)中的i:
M_i =数据[i,:]
totllf_i = 0
对于范围(L)中的l:
rho_l = rho [l]
sigma_l = sigma [:, K * l:K *(l + 1)]
mu_l = mu [l,:]
chol_l = np.linalg.cholesky(sigma_l)
for itertools.product(*(list (范围(K)中k的范围(n))):
wt = np.prod(w [list(ix)])
pt = np.sqrt(2)* chol_l.dot (v [list(ix)])+ mu_l
totllf_i + = wt * rho_l * mv.pdf(M_i,A [:, 0] + A [:, 1 :: .. dot(pt),V)

totllf + = np.log(totllf_i)

返回totllf

为了实现这一点,我需要具有矩阵乘法,转置,行列式,矩阵逆和cholesky分解的功能。我看过



例如,在221行中,我在编译时收到以下消息:应键入索引以提高访问效率。但是我以为我输入了索引k。另外,由于 addvv_c 显示为黄色,因此在下面向您显示其定义。

  cpdef void addvv_c(double [:] a,double [:] b,double [:] out):
'''逐个添加两个向量
'''
cdef Py_ssize_t i,n = a.shape [0]

对于范围(n)中的i:
out [i] = a [i] + b [i]


解决方案

关于优化的Cython / BLAS功能的一些小问题:

  ipiv_a = np.zeros(n).astype(np.int32)
cdef int [:: 1] ipiv = ipiv_a

可以有两个简单的改进:不必通过一个临时变量,您可以直接创建一个数组键入 np.int32 而不是创建其他类型的数组然后进行强制转换:

  cdef int [:: 1] ipiv = np.zeros(n,dtype = np.int32)

同样,在两种功能中您可以通过执行以下操作以较少的步骤初始化 B

  cdef double [:, :: 1] B = A.copy()

而不是创建零数组并复制




第二个(更重要的)更改是将C数组用于临时变量,例如Fortran工作区。我仍然将返回值保留为numpy数组,因为引用计数和将它们发送回Python的功能非常有用。

  cdef double * work =< double *> malloc(n * n * sizeof(double))
尝试:
#函数其余部分
最终:
免费(工作)

您需要导入 malloc 免费来自 libc.stdlib try:...最后:确保内存已正确释放。不要太在意-例如,如果不清楚在哪里释放C数组,则只需使用numpy。




要查看的最后一个选项是不具有返回值,而是修改输入:

  cdef void inv_c(double [:,:: 1 ] A,double [:,:: 1] B):
#检查A和B的大小是否正确,然后将其写入B
#...

这样做的好处是,如果需要在具有相同大小的输入的循环中调用此函数,则只需要为整个循环进行一次分配。您可以将其扩展为也包括工作数组,尽管那可能要复杂一些。


I'm writing a function that I'd like to translate to C as much as possible using Cython. To do this, I'll need to use linear algebra operations. Here is my function. EDIT: The lesson I've learned is to try to take care of linear algebra outside of loops, which I've largely been able to do. Otherwise, resort to wrapping LAPACK/BLAS or writing my own functions.

import numpy as np
from scipy.stats import multivariate_normal as mv
import itertools

def llf(data, rho, mu, sigma, A, V, n):
    '''evaluate likelihood by guass-hermite quadrature

    Parameters
    ----------
    data : array
        N x J matrix, columns are measurements
    rho : array
        length L vector of weights for mixture of normals
    mu : array
        L x K vector of means of mixture of normals
    sigma : array
        K x L*K matrix of variance matrices for mixture of normals
    A : array
        J x (K + 1) matrix of loadings
    V : array
        J x J variance matrix of measurement errors
    n : int
        number of sample points for quadrature
    '''

    N = data.shape[0]
    L, K = mu.shape

    # getting weights and sample points for approximating integral
    v, w = np.polynomial.hermite.hermgauss(n)

    totllf = 0
    for i in range(N):
        M_i = data[i, :]
        totllf_i = 0
        for l in range(L):
            rho_l = rho[l]
            sigma_l = sigma[:, K*l:K*(l+1)]
            mu_l = mu[l, :]
            chol_l = np.linalg.cholesky(sigma_l)
            for ix in itertools.product(*(list(range(n)) for k in range(K))):
               wt =  np.prod(w[list(ix)])
               pt = np.sqrt(2)*chol_l.dot(v[list(ix)]) + mu_l
               totllf_i += wt*rho_l*mv.pdf(M_i, A[:, 0] + A[:, 1:].dot(pt), V)

        totllf += np.log(totllf_i)

    return totllf

In order to accomplish this I'd need to have functions for matrix multiplication, transpose, determinant, matrix inverse and cholesky decompostion. I've seen some posts on using BLAS functions, but its really unclear to me how to use these.

EDIT 04/29/18

As suggested, I've taken a memory view approach and initialized everything prior to the loop. My new function is written as

def llf_c(double[:, ::1] data, double[::1] rho, double[:, ::1] mu, 
                double[:, ::1] sigma, double[:, ::1] A, double[:, ::1] V, int n):
    '''evaluate likelihood by guass-hermite quadrature

    Parameters
    ----------
    data : array
        N x J matrix, columns are measurements
    rho : array
        length L vector of weights for mixture of normals
    mu : array
        L x K vector of means of mixture of normals
    sigma : array
        K x L*K matrix of variance matrices for mixture of normals
    A : array
        J x (K + 1) matrix of loadings
    V : array
        J x J variance matrix of measurement errors
    n : int
        number of sample points for quadrature
    '''

    cdef Py_ssize_t N = data.shape[0], J = data.shape[1], L = mu.shape[0], K = mu.shape[1]

    # initializing indexing variables
    cdef Py_ssize_t i, l, j, k

    # getting weights and sample points for approximating integral
    v_a, w_a = np.polynomial.hermite.hermgauss(n)
    cdef double[::1] v = v_a
    cdef double[::1] w = w_a
    cdef double[::1] v_ix = np.zeros(K, dtype=np.float)

    # initializing memory views for cholesky decomposition of sigma matrices
    sigma_chol_a = np.zeros((K, L*K), dtype=np.float)
    for l in range(L):
        sigma_chol_a[:, K*l:K*(l+1)] = np.linalg.cholesky(sigma[:, K*l:K*(l+1)])

    cdef double[:, ::1] sigma_chol = sigma_chol_a

    # intializing V inverse and determinant
    cdef double[:, ::1] V_inv = np.linalg.inv(V)
    cdef double V_det = np.linalg.det(V)

    # initializing memoryviews for work matrices
    cdef double[::1] work_K = np.zeros(K, dtype=np.float)
    cdef double[::1] work_J = np.zeros(J, dtype=np.float)

    # initializing memoryview for quadrature points
    cdef double[::1] pt = np.zeros(K, dtype=np.float)

    # initializing memorview for means for multivariate normal
    cdef double[::1] loc = np.zeros(J, dtype=np.float)

    # initializing values for loop
    cdef double[::1] totllf = np.zeros(N, dtype=np.float)
    cdef double wt, pdf_init = 1./sqrt(((2*pi)**J)*V_det)
    cdef int[:, ::1] ix_vals = np.vstack(itertools.product(*(list(range(n)) for k in range(K)))).astype(np.int32)
    cdef Py_ssize_t ix_len = ix_vals.shape[0]

    for ix_row in range(ix_len):

        ix = ix_vals[ix_row]

        # weights and points for quadrature
        wt = 1.
        for k in range(K):
            wt *= w[ix[k]]
            v_ix[k] = v[ix[k]]

        for l in range(L):

            # change of variables
            dotmv_c(sigma_chol[:, K*l:K*(l+1)], v_ix, work_K)
            for k in range(K):
                pt[k] = sqrt(2)*work_K[k]
            addvv_c(pt, mu[l, :], pt)

            for i in range(N):

                # generating demeaned vector for multivariate normal pdf
                dotmv_c(A[:, 1:], pt, work_J)
                addvv_c(A[:, 0], work_J, work_J)
                for j in range(J):
                    loc[j] = data[i, j] - work_J[j]

                # performing matrix products in exponential
                # print(wt, rho[l], np.asarray(work_J))
                dotvm_c(loc, V_inv, work_J)
                totllf[i] += wt*rho[l]*pdf_init*exp(-0.5*dotvv_c(work_J, loc))

    return np.log(np.asarray(totllf)).sum()

The dotvm_c, dotmv_c and addvv_c are functions that performance matrix multiplication of vector and matrix, matrix and vector, and elementwise addition of two vectors. I've written these in Cython as well, but am not including for brevity. I'm no longer wrapping any LAPACK functions as I do all other linear algebra prior to the loop using numpy. I have a few more questions. Why do I still have yellow in the loop? (See profile below). I thought everything should be in C now. Also if you have any other suggestions based on new implementation, please let me know.

For instance, in line 221, I get this message upon compiling: "Index should be typed for more efficient access." But I thought I typed the index k. Also, since addvv_c shows up in yellow, I'll show you it's definition below.

cpdef void addvv_c(double[:] a, double[:] b, double[:] out):
    '''add two vectors elementwise
    '''
    cdef Py_ssize_t i, n = a.shape[0]

    for i in range(n):
        out[i] = a[i] + b[i]

解决方案

A few of small points with respect to your optimized Cython/BLAS functions:

ipiv_a = np.zeros(n).astype(np.int32)
cdef int[::1] ipiv = ipiv_a

can have two easy improvements: it doesn't has to go through a temporary variable, and you can directly create an array of type np.int32 rather than create an array of a different type then casting:

 cdef int[::1] ipiv = np.zeros(n,dtype=np.int32)

Simiarly, in both fucntions you can initialize B in a fewer steps by doing

cdef double[:, ::1] B = A.copy()

rather than creating a zero array and copying


The second (more significant) change would be to use C arrays for temporary variables such as the Fortran workspaces. I'd still keep things like return values as numpy arrays since the reference counting and ability to send them back to Python is very useful.

 cdef double* work = <double*>malloc(n*n*sizeof(double))
 try:
     # rest of function
 finally:
     free(work)

You need to cimport malloc and free from libc.stdlib. The try: ... finally: ensures the memory is correctly deallocated. Don't go too over the top with this - for example, if it isn't obvious where to deallocate a C array then just use numpy.


The final option to look at is to not have a return value but to modify an input:

cdef void inv_c(double[:,::1] A, double[:,::1] B):
    # check that A and B are the right size, then just write into B
    # ...

The advantage of this is that if you need to call this in a loop with identical sized inputs then you only need to do one allocation for the whole loop. You could extend this to include the working arrays too, although that might be a little more complicated.

这篇关于在cython中高效的矩阵运算,没有python对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆