改善Scipy稀疏矩阵的乘法性能 [英] Improving Performance of Multiplication of Scipy Sparse Matrices

查看:124
本文介绍了改善Scipy稀疏矩阵的乘法性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个大小为(170k x 170k),具有4.4亿个非空点的Scipy CSC稀疏矩阵"sm"和一个具有几个非空点的稀疏CSC矢量"v"(170k x 1),可以提高操作性能的方法:

Given a Scipy CSC Sparse matrix "sm" with dimensions (170k x 170k) with 440 million non-null points and a sparse CSC vector "v" (170k x 1) with a few non-null points, is there anything that can be done to improve the performance of the operation:

resul = sm.dot(v)

?

当前大约需要1秒钟.随着CSR的增加,矩阵初始化时间最多增加了3秒,因此CSC的效果更好.

Currently it's taking roughly 1 second. Initializing the matrices as CSR increased the time up to 3 seconds, so CSC performed better.

SM是产品之间相似度的矩阵,而V是代表用户购买或点击了哪些产品的向量.因此,对于每个用户,sm都是相同的.

SM is a matrix of similarities between products and V is the vector that represents which products the user bought or clicked on. So for every user sm is the same.

我正在使用Ubuntu 13.04,Intel i3 @ 3.4GHz,4核.

I'm using Ubuntu 13.04, Intel i3 @3.4GHz, 4 Cores.

通过研究SO,我了解了Ablas软件包.我输入了终端:

Researching on SO I read about Ablas package. I typed into the terminal:

~$ ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so

导致的结果:

    linux-vdso.so.1 =>  (0x00007fff56a88000)
    libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f888137f000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8880fb7000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8880cb1000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f888183c000)

据我了解,这意味着我已经在使用Ablas的高性能软件包.我仍然不确定这个程序包是否已经实现了并行计算,但是看起来还没有.

And for what I understood this means that I'm already using a high performance package from Ablas. I'm still not sure though if this package already implements parallel computing but it looks like it doesn't.

多核处理可以帮助提高性能吗?如果是这样,是否有任何库对python有帮助?

Could multi-core processing help to boost performance? If so, is there any library that could be helpful in python?

我也在考虑在Cython中实现此想法,但是我不知道这是否会带来良好的结果.

I was also considering the idea of implementing this in Cython but I don't know if this would lead to good results.

谢谢.

推荐答案

稀疏矩阵乘法例程直接用C ++进行编码,并且从源代码的快速浏览来看,似乎没有任何关联任何优化的库.此外,似乎没有利用第二矩阵是向量来最小化计算的事实.因此,您可以通过访问稀疏矩阵的内脏并自定义乘法算法来大大加快速度.下面的代码在纯Python/Numpy中执行此操作,并且如果向量确实具有一些非空点",则它与scipy的C ++代码的速度相匹配:如果在Cython中实现,则速度会显着提高:

The sparse matrix multiplication routines are directly coded in C++, and as far as a quick look at the source reveals, there doesn't seem to be any hook to any optimized library. Furthermore, it doesn't seem to be taking advantage of the fact that the second matrix is a vector to minimize calculations. So you can probably speed things up quite a bit by accessing the guts of the sparse matrix, and customizing the multiplication algorithm. The following code does so in pure Python/Numpy, and if the vector really has "a few non-null points" it matches the speed of scipy's C++ code: if you implemented it in Cython, the speed increase should be noticeable:

def sparse_col_vec_dot(csc_mat, csc_vec):
    # row numbers of vector non-zero entries
    v_rows = csc_vec.indices
    v_data = csc_vec.data
    # matrix description arrays
    m_dat = csc_mat.data
    m_ind = csc_mat.indices
    m_ptr = csc_mat.indptr
    # output arrays
    sizes = m_ptr.take(v_rows+1) - m_ptr.take(v_rows)
    sizes = np.concatenate(([0], np.cumsum(sizes)))
    data = np.empty((sizes[-1],), dtype=csc_mat.dtype)
    indices = np.empty((sizes[-1],), dtype=np.intp)
    indptr = np.zeros((2,), dtype=np.intp)

    for j in range(len(sizes)-1):
        slice_ = slice(*m_ptr[[v_rows[j] ,v_rows[j]+1]])
        np.multiply(m_dat[slice_], v_data[j], out=data[sizes[j]:sizes[j+1]])
        indices[sizes[j]:sizes[j+1]] = m_ind[slice_]
    indptr[-1] = len(data)
    ret = sps.csc_matrix((data, indices, indptr),
                         shape=csc_vec.shape)
    ret.sum_duplicates()

    return ret

发生的事情的简要说明:CSC矩阵定义为三个线性数组:

A quick explanation of what is going on: a CSC matrix is defined in three linear arrays:

  • data包含以列主顺序存储的非零条目.
  • indices包含非零条目的行.
  • indptr的条目比矩阵的列数多,并且j列的项目在data[indptr[j]:indptr[j+1]]中找到,并且在indices[indptr[j]:indptr[j+1]]行中.
  • data contains the non-zero entries, stored in column major order.
  • indices contains the rows of the non-zero entries.
  • indptr has one entry more than the number of columns of the matrix, and items in column j are found in data[indptr[j]:indptr[j+1]] and are in rows indices[indptr[j]:indptr[j+1]].

因此要乘以稀疏的列向量,可以对列向量的dataindices进行迭代,对于每个(d, r)对,提取矩阵的相应列,然后将其乘以d ,即data[indptr[r]:indptr[r+1]] * dindices[indptr[r]:indptr[r+1]].

So to multiply by a sparse column vector, you can iterate over data and indices of the column vector, and for each (d, r) pair, extract the corresponding column of the matrix and multiply it by d, i.e. data[indptr[r]:indptr[r+1]] * d and indices[indptr[r]:indptr[r+1]].

这篇关于改善Scipy稀疏矩阵的乘法性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆