Matlab中的多线程稀疏矩阵乘法 [英] Multithreaded sparse matrix multiplication in Matlab

查看:439
本文介绍了Matlab中的多线程稀疏矩阵乘法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在执行一个NxN稀疏(〜1-2%)矩阵的矩阵乘法,我们称它为B,一个NxM密集矩阵,我们称它为A(其中M

现在,通常,矩阵乘法和大多数其他矩阵运算在Matlab中隐式并行化,即它们自动利用多个线程. 如果这两个矩阵都是稀疏的,则情况并非如此(请参见例如 StackOverflow讨论-对于预期的问题没有答案-和这基本上是没有答案的MathWorks线程). 这对我来说是一个不愉快的惊喜.

我们可以通过以下代码来验证多线程对稀疏矩阵操作没有影响:

clc; clear all; 

N = 5000;         % set matrix sizes
M = 3000;       
A = randn(N,M);   % create dense random matrices
B = sprand(N,N,0.015); % create sparse random matrix
Bf = full(B);     %create a dense form of the otherwise sparse matrix B

for i=1:3 % test for 1, 2, and 4 threads
  m(i) = 2^(i-1);
  maxNumCompThreads(m(i)); % set the thread count available to Matlab
  tic                      % starts timer
    y = B*A; 
  walltime(i) = toc;       % wall clock time
  speedup(i) = walltime(1)/walltime(i);
end

% display number of threads vs. speed up relative to just a single thread
[m',speedup']

这将产生以下输出,说明使用1个,2个和4个线程进行稀疏操作之间没有区别:

threads   speedup
1.0000    1.0000
2.0000    0.9950
4.0000    1.0155

另一方面,如果我用B的密集形式替换B,即上面的Bf,则我得到了明显的提速:

threads   speedup
1.0000    1.0000
2.0000    1.8894
4.0000    3.4841

(说明在Matlab中对稠密矩阵的矩阵运算确实隐式并行化了)

所以,我的问题是:是否有任何方法可以访问稀疏矩阵的矩阵操作的并行化/线程化版本(在Matlab中),而无需将其转换为密集形式? 我在MathWorks上发现了一个旧的涉及.mex文件的建议,但似乎链接已失效并且没有很好的文档记录/没有反馈?有其他选择吗?

这似乎是对隐式并行功能的相当严格的限制,因为稀疏矩阵在计算繁重的问题中比比皆是,并且在这些情况下非常需要超线程功能.

解决方案

MATLAB已经使用了 SuiteSparse Tim Davis ,了解其在稀疏矩阵上的许多操作(例如,请参见此处),但我都不相信它们是多线程的.

通常,对稀疏矩阵的计算是受内存限制的,而不是受CPU限制的.因此,即使您使用多线程库,我也怀疑您会在性能方面看到巨大的好处,至少不能与专门用于密集矩阵的那些媲美...

毕竟,稀疏矩阵的设计在思想上与常规稠密矩阵有不同的目标,高效的内存存储通常更为重要.


我快速进行了在线搜索,并找到了一些实现方法:

I am performing several matrix multiplications of an NxN sparse (~1-2%) matrix, let's call it B, with an NxM dense matrix, let's call it A (where M < N). N is large, as is M; on the order of several thousands. I am running Matlab 2013a.

Now, usually, matrix multiplications and most other matrix operations are implicitly parallelized in Matlab, i.e. they make use of multiple threads automatically. This appears NOT to be the case if either of the matrices are sparse (see e.g. this StackOverflow discussion - with no answer for the intended question - and this largely unanswered MathWorks thread). This is a rather unhappy surprise for me.

We can verify that multithreading has no effects for sparse matrix operations by the following code:

clc; clear all; 

N = 5000;         % set matrix sizes
M = 3000;       
A = randn(N,M);   % create dense random matrices
B = sprand(N,N,0.015); % create sparse random matrix
Bf = full(B);     %create a dense form of the otherwise sparse matrix B

for i=1:3 % test for 1, 2, and 4 threads
  m(i) = 2^(i-1);
  maxNumCompThreads(m(i)); % set the thread count available to Matlab
  tic                      % starts timer
    y = B*A; 
  walltime(i) = toc;       % wall clock time
  speedup(i) = walltime(1)/walltime(i);
end

% display number of threads vs. speed up relative to just a single thread
[m',speedup']

This produces the following output, which illustrates that there is no difference between using 1, 2, and 4 threads for sparse operations:

threads   speedup
1.0000    1.0000
2.0000    0.9950
4.0000    1.0155

If, on the other hand, I replace B by its dense form, refered to as Bf above, I get significant speedup:

threads   speedup
1.0000    1.0000
2.0000    1.8894
4.0000    3.4841

(illustrating that matrix operations for dense matrices in Matlab are indeed implicitly parallelized)

So, my question: is there any way at all to access a parallelized/threaded version of matrix operations for sparse matrices (in Matlab) without converting them to dense form? I found one old suggestion involving .mex files at MathWorks, but it seems the links are dead and not very well documented/no feedback? Any alternatives?

It seems to be a rather severe restriction of implicit parallelism functionality, since sparse matrices are abound in computationally heavy problems, and hyperthreaded functionality highly desirable in these cases.

解决方案

MATLAB already uses SuiteSparse by Tim Davis for many of its operation on sparse matrices (for example see here), but neither of which I believe are multithreaded.

Usually computations on sparse matrices are memory-bound rather than CPU-bound. So even you use a multithreaded library, I doubt you will see huge benefits in terms of performance, at least not comparable to those specialized in dense matrices...

After all that the design of sparse matrices have different goals in mind than regular dense matrices, where efficient memory storage is often more important.


I did a quick search online, and found a few implementations out there:

这篇关于Matlab中的多线程稀疏矩阵乘法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆