得到“节点栈溢出".当绑定多个稀疏矩阵时 [英] Getting "node stack overflow" when cbind multiple sparse matrices
问题描述
我在列表对象中存储了100,000个稀疏矩阵("dgCMatrix").每个矩阵的行号相同(8,000,000),列表的大小约为25 Gb.现在,当我这样做时:
I have 100,000 sparse matrices("dgCMatrix") store in a list object. The row number of every matrix is the same(8,000,000) and the size of the list is approximately 25 Gb. Now when I do:
do.call(cbind, theListofMatrices)
将所有矩阵组合成一个大的稀疏矩阵,我得到了节点堆栈溢出".实际上,我什至不能仅使用该列表中的500个元素来执行此操作,该列表应该输出仅100 Mb大小的稀疏矩阵.
to combine all matrices into one big sparse matrix, I got "node stack overflow". Actually, I can't even do this with only 500 elements out of that list, which should output a sparse matrix with a size of only 100 Mb.
我对此的猜测是 cbind()函数将稀疏矩阵转换为正常的密集矩阵,从而导致堆栈溢出?
My speculation for this is that the cbind() function transformed the sparse matrix to a normal dense matrix and thus cause the stack overflow?
实际上,我已经尝试过类似的事情:
Actually, I have tried something like this:
tmp = do.call(cbind, theListofMatrices[1:400])
这很好用,并且tmp仍然是稀疏矩阵,大小为95 Mb,然后我尝试了:
this works fine, and tmp is still a sparse matrix with a size of 95 Mb, and then I tried:
> tmp = do.call(cbind, theListofMatrices[1:410])
Error in stopifnot(0 <= deparse.level, deparse.level <= 2) :
node stack overflow
,然后发生错误.但是,我可以轻松执行以下操作:
and then the error occurred. However, I am having no trouble doing something like:
cbind(tmp, tmp, tmp, tmp)
因此,我相信这与do.call()
thus, I believe it has something to do with do.call()
Reduce()似乎可以解决我的问题,尽管我仍然不知道do.call()失败的原因.
Reduce() seems to solve my problem, though I still don't know the reason why do.call() crushes.
推荐答案
问题不是在do.call()
中,而是由于实现了Matrix包中的cbind
.它使用递归将各个参数绑定在一起.例如,将Matrix::cbind(mat1, mat2, mat3)
转换为与Matrix::cbind(mat1, Matrix::cbind(mat2, mat3))
相似的内容.
由于do.call(cbind, theListofMatrices)
本质上是cbind(theListofMatrices[[1]], theListofMatrices[[2]], ...)
,因此您对cbind
函数使用的参数过多,最终您将得到嵌套太深的递归,并且该递归将失败.
The problem is not in do.call()
but due to the way cbind
from the Matrix package is implemented. It uses recursion to bind the individual arguments together. For instance, Matrix::cbind(mat1, mat2, mat3)
is translated to something along the lines of Matrix::cbind(mat1, Matrix::cbind(mat2, mat3))
.
Since do.call(cbind, theListofMatrices)
is basically cbind(theListofMatrices[[1]], theListofMatrices[[2]], ...)
you have too many arguments to the cbind
function and you will end up with a recursion that's nested too deeply and it will fail.
因此, Ben的评论使用Reduce()
是解决此问题的好方法,因为它避免了递归,而是将其替换为迭代:
Thus, Ben's comment to use Reduce()
is a good way to work around that issue since it avoids the recursion and replaces it with an iteration:
tmp <- Reduce(cbind, theListofMatrices[-1], theListofMatrices[[1]])
这篇关于得到“节点栈溢出".当绑定多个稀疏矩阵时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!