使用MKL BLAS时,scipy是否支持稀疏矩阵乘法的多线程处理? [英] Does scipy support multithreading for sparse matrix multiplication when using MKL BLAS?

查看:304
本文介绍了使用MKL BLAS时,scipy是否支持稀疏矩阵乘法的多线程处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据MKL BLAS文档 所有矩阵矩阵操作(级别3)都针对稠密和稀疏BLAS进行了线程化." http://software.intel.com /en-us/articles/parallelism-in-the-inth-math内核库

According to MKL BLAS documentation "All matrix-matrix operations (level 3) are threaded for both dense and sparse BLAS." http://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library

我已经用MKL BLAS构建了Scipy.使用下面的测试代码,我看到了密集而不是稀疏矩阵乘法的预期多线程加速. Scipy是否有任何更改以启用多线程稀疏操作?

I have built Scipy with MKL BLAS. Using the test code below, I see the expected multithreaded speedup for dense, but not sparse, matrix multiplication. Are there any changes to Scipy to enable multithreaded sparse operations?

# test dense matrix multiplication
from numpy import *
import time    
x = random.random((10000,10000))
t1 = time.time()
foo = dot(x.T, x)
print time.time() - t1

# test sparse matrix multiplication
from scipy import sparse
x = sparse.rand(10000,10000)
t1 = time.time()
foo = dot(x.T, x)
print time.time() - t1

推荐答案

据我所知,答案是否定的.但是,您可以围绕MKL稀疏乘法例程构建自己的包装器.您询问了将两个稀疏矩阵相乘的问题.以下是一些包装代码,我将一个稀疏矩阵乘以一个密集向量,因此它应该很容易适应(请参阅Intel MKL参考中的mkl_cspblas_dcsrgemm).另外,请注意scipy数组的存储方式:默认为coo,但csr(或csc)可能是更好的选择.我选择了csr,但MKL支持大多数类型(只需调用适当的例程即可).

As far as I know, the answer is no. But, you can build your own wrapper around the MKL sparse multiply routines. You asked about the multiplying two sparse matrices. Below is some a wrapper code I used for multiplying one sparse matrix times a dense vector, so it shouldn't be hard to adapt (look at the Intel MKL reference for mkl_cspblas_dcsrgemm). Also, be aware of how your scipy arrays are stored: default is coo, but csr (or csc) may be better choices. I chose csr, but MKL supports most types (just call the appropriate routine).

据我所知,scipy的默认值和MKL都是多线程的.通过更改OMP_NUM_THREADS,我可以看到性能上的差异.

From what I could tell, both scipy's default and MKL are multithreaded. By changing OMP_NUM_THREADS I could see a difference in performance.

要使用下面的功能,如果您具有MKL的最新版本,只需确保已将LD_LIBRARY_PATHS设置为包括相关的MKL目录.对于较旧的版本,您需要构建一些特定的库.我从使用python的IntelMKL中获取了我的信息

To use the function below, if you havea a recent version of MKL, just make sure you have LD_LIBRARY_PATHS set to include the relevant MKL directories. For older versions, you need to build some specific libraries. I got my information from IntelMKL in python

def SpMV_viaMKL( A, x ):
 """
 Wrapper to Intel's SpMV
 (Sparse Matrix-Vector multiply)
 For medium-sized matrices, this is 4x faster
 than scipy's default implementation
 Stephen Becker, April 24 2014
 stephen.beckr@gmail.com
 """

 import numpy as np
 import scipy.sparse as sparse
 from ctypes import POINTER,c_void_p,c_int,c_char,c_double,byref,cdll
 mkl = cdll.LoadLibrary("libmkl_rt.so")

 SpMV = mkl.mkl_cspblas_dcsrgemv
 # Dissecting the "cspblas_dcsrgemv" name:
 # "c" - for "c-blas" like interface (as opposed to fortran)
 #    Also means expects sparse arrays to use 0-based indexing, which python does
 # "sp"  for sparse
 # "d"   for double-precision
 # "csr" for compressed row format
 # "ge"  for "general", e.g., the matrix has no special structure such as symmetry
 # "mv"  for "matrix-vector" multiply

 if not sparse.isspmatrix_csr(A):
     raise Exception("Matrix must be in csr format")
 (m,n) = A.shape

 # The data of the matrix
 data    = A.data.ctypes.data_as(POINTER(c_double))
 indptr  = A.indptr.ctypes.data_as(POINTER(c_int))
 indices = A.indices.ctypes.data_as(POINTER(c_int))

 # Allocate output, using same conventions as input
 nVectors = 1
 if x.ndim is 1:
    y = np.empty(m,dtype=np.double,order='F')
    if x.size != n:
        raise Exception("x must have n entries. x.size is %d, n is %d" % (x.size,n))
 elif x.shape[1] is 1:
    y = np.empty((m,1),dtype=np.double,order='F')
    if x.shape[0] != n:
        raise Exception("x must have n entries. x.size is %d, n is %d" % (x.size,n))
 else:
    nVectors = x.shape[1]
    y = np.empty((m,nVectors),dtype=np.double,order='F')
    if x.shape[0] != n:
        raise Exception("x must have n entries. x.size is %d, n is %d" % (x.size,n))

 # Check input
 if x.dtype.type is not np.double:
    x = x.astype(np.double,copy=True)
 # Put it in column-major order, otherwise for nVectors > 1 this FAILS completely
 if x.flags['F_CONTIGUOUS'] is not True:
    x = x.copy(order='F')

 if nVectors == 1:
    np_x = x.ctypes.data_as(POINTER(c_double))
    np_y = y.ctypes.data_as(POINTER(c_double))
    # now call MKL. This returns the answer in np_y, which links to y
    SpMV(byref(c_char("N")), byref(c_int(m)),data ,indptr, indices, np_x, np_y ) 
 else:
    for columns in xrange(nVectors):
        xx = x[:,columns]
        yy = y[:,columns]
        np_x = xx.ctypes.data_as(POINTER(c_double))
        np_y = yy.ctypes.data_as(POINTER(c_double))
        SpMV(byref(c_char("N")), byref(c_int(m)),data,indptr, indices, np_x, np_y ) 

 return y

这篇关于使用MKL BLAS时,scipy是否支持稀疏矩阵乘法的多线程处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆