我什么时候应该使用`sparse`? [英] When should I be using `sparse`?

查看:75
本文介绍了我什么时候应该使用`sparse`?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在浏览Matlab的 sparse文档,试图查找是否存在有关何时使用稀疏表示而不是完整表示的任何准则.

I've been looking through Matlab's sparse documentation trying to find whether there are any guidelines for when it makes sense to use a sparse representation rather than a full representation.

例如,我有一个矩阵data,其中包含约30%的非零条目.我可以检查所用的内存.

For example, I have a matrix data with around 30% nonzero entries. I can check the memory used.

whos data
  Name             Size                 Bytes  Class     Attributes

  data      84143929x11            4394073488  double    sparse    


data = full(data);
whos data
  Name             Size                 Bytes  Class     Attributes

  data      84143929x11            7404665752  double              

在这里,我显然是在节省内存,但是对于任何包含30%非零条目的矩阵,这都是真的吗? 50%的非零条目呢?我应该以什么百分比切换到完整矩阵是否有经验法则?

Here, I'm clearly saving memory, but would this be true of any matrix with 30% nonzero entries? What about 50% nonzero entries? Is there a rule of thumb for at what percentage I should switch to a full matrix?

计算方面呢?通常,用稀疏矩阵进行矩阵乘法会变慢还是变快? 稀疏矩阵运算表示

What about computationally? Is it as a rule slower or faster to do a matrix multiplication with a sparse matrix? Sparse Matrix Operations says that

稀疏运算的计算复杂度与 nnz,矩阵中非零元素的数量.计算的 复杂度还线性取决于行大小m和列大小n 矩阵的平方,但与乘积m * n(总数)无关 零和非零元素的组合.

The computational complexity of sparse operations is proportional to nnz, the number of nonzero elements in the matrix. Computational complexity also depends linearly on the row size m and column size n of the matrix, but is independent of the product m*n, the total number of zero and nonzero elements.

在不知道更多细节的情况下,很难将其与完整矩阵进行比较.

This is difficult to compare to a full matrix without knowing more details.

Scipy的稀疏矩阵库说明了每种稀疏格式的优缺点.例如,对于 csc_matrix

Scipy's sparse matrix library explains pros and cons of each sparse format. For example for the csc_matrix

CSC格式的优点

Advantages of the CSC format

  • 高效的算术运算CSC + CSC,CSC * CSC等.
  • 有效的列切片
  • 快速矩阵向量乘积(CSR,BSR可能更快)
  • efficient arithmetic operations CSC + CSC, CSC * CSC, etc.
  • efficient column slicing
  • fast matrix vector products (CSR, BSR may be faster)

CSC格式的缺点

  • 慢行切片操作(考虑CSR)
  • 更改稀疏结构非常昂贵(考虑LIL或DOK)

是否存在有关Matlab sparse实现的类似信息?如果可以,我在哪里可以找到它?

Does similar information about Matlab's sparse implementation exist? If so where can I find it?

推荐答案

在完整矩阵上的许多操作都使用BLAS/LAPACK库调用,这些调用经过了疯狂的优化并且难以克服.实际上,在可以充分利用(i)稀疏性和(ii)特殊矩阵结构的特殊情况下,对稀疏矩阵进行的操作只会胜过对完整矩阵进行的操作.

Many operations on full matrices use BLAS/LAPACK library calls that are insanely optimized and tough to beat. In practice, operations on sparse matrices will only outperform those on full matrices in specialized situations that can sufficiently exploit (i) sparsity and (ii) special matrix structure.

仅仅随机使用稀疏可能会使您的情况变得更糟. 示例:将10000x10000完整矩阵添加到10000x10000完整矩阵中哪个更快?还是将10000x10000完整矩阵添加到完全稀疏(即一切为零)的10000x10000矩阵中?试试吧!在我的系统上,完整+完整速度更快!

Just randomly using sparse probably will make you worse off. Example: which is faster, adding a 10000x10000 full matrix to a 10000x10000 full matrix? Or adding a 10000x10000 full matrix to an entirely sparse (i.e. everything is zero) 10000x10000 matrix? try it! On my system, the full + full is faster!

示例1:求解线性系统A * x = b,其中A为5000x5000,但它是由500个5x5块构成的块对角矩阵.设置代码:

Example 1: solving linear system A*x=b where A is 5000x5000 but is block diagonal matrix constructed of 500 5x5 blocks. Setup code:

As = sparse(rand(5, 5));
for(i=1:999)
   As = blkdiag(As, sparse(rand(5,5))); 
end;                         %As is made up of 500 5x5 blocks along diagonal
Af = full(As); b = rand(5000, 1);

然后您可以测试速度差:

Then you can test speed difference:

As \ b % operation on sparse As takes .0012 seconds
Af \ b % solving with full Af takes about 2.3 seconds

通常,一个5000可变线性系统有些困难,但是1000个单独的5可变线性系统却是微不足道的.后者基本上是在稀疏情况下可以解决的问题.

In general, a 5000 variable linear system is somewhat difficult, but 1000 separate 5 variable linear systems is trivial. The latter is basically what gets solved in the sparse case.

总的来说,如果您具有特殊的矩阵结构并且可以巧妙地利用稀疏性,则有可能解决疯狂的大问题,否则这些问题将是棘手的.如果您有一个足够大的专业问题,拥有足够稀疏的矩阵,并且聪明地使用线性代数(以保持稀疏性),那么稀疏类型化的矩阵可能会非常有用.

The overall story is that if you have special matrix structure and can cleverly exploit sparsity, it's possible to solve insanely large problems that otherwise would be intractable. If you have a specialized problem that is sufficiently large, have a matrix that is sufficiently sparse, and are clever with linear algebra (so as to preserve sparsity), a sparse typed matrix can be extremely powerful.

另一方面,没有深思熟虑的情况下随机地进行稀疏处理几乎可以肯定会使您的代码变慢.

On the other hand, randomly throwing in sparse without deep, careful thought is almost certainly going to make your code slower.

这篇关于我什么时候应该使用`sparse`?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆