R矩阵.将稀疏矩阵的特定元素设置为零. [英] R Matrix. Set particular elements of sparse matrix to zero.

查看:155
本文介绍了R矩阵.将稀疏矩阵的特定元素设置为零.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有相当大的稀疏矩阵(dgCMatrixdgTMatrix,但这在这里不是很重要).我想将一些元素设置为零.
例如,我有3e4 * 3e4矩阵,该矩阵是较高的三角形,并且非常密集:〜23%的元素不是零. (实际上,我有更大的矩阵〜1e5 * 1e5,但它们更稀疏了),因此,在三元组dgTMatrix形式中,大约需要3.1gb的RAM. 现在,我想将小于某个阈值(例如,1)的所有元素设置为零.

I have reasonably large sparse matrix (dgCMatrix or dgTMatrix, but this is not very important here). And I want to set some elements to zero.
For example I have 3e4 * 3e4 matrix, which is upper triangular and it is quite dense: ~23% of elements are not zeros. (actually I have much bigger matrices ~ 1e5 * 1e5, but they are much more sparser) So in triplet dgTMatrix form it takes about 3.1gb of RAM. Now I want to set to zero all elements which are less some threshold (say, 1).

  1. 非常幼稚的方法(也在中进行了讨论> )如下:

threshold <- 1
m[m < threshold] <- 0

但是这种解决方案远非完美- 130秒运行时间(在具有足够内存的机器上,因此没有交换),更重要的是需要〜25-30gb的额外RAM .

But this solution is far from perfect - 130 sec runtime (on machine which has enough ram, so there is no swapping) and what is more important needs ~ 25-30gb additional RAM.

我发现(并且很高兴)的第二个解决方案更加有效-从头开始构建新矩阵:

Second solution I found (and mostly happy) is far more effective - construct new matrix from scratch:

threshold <- 1
ind <- which(m@x > threshold)
m <- sparseMatrix(i = m@i[ind], j = m@j[ind], x = m@x[ind], 
             dims = m@Dim, dimnames = m@Dimnames, 
             index1 = FALSE, 
             giveCsparse = FALSE, 
             check = FALSE)

仅需约6秒,并且需要约5GB的内存.

问题是-我们可以做得更好吗?特别有趣的是,我们是否可以用更少的RAM使用量来做到这一点?如果能够执行此 .

The question is - can we do better? Especially interesting, whether, can we do this with less RAM usage? It would be perfect if will be able to perform this in place.

推荐答案

像这样:

library(Matrix)
m <- Matrix(0+1:28, nrow = 4)
m[-3,c(2,4:5,7)] <- m[ 3, 1:4] <- m[1:3, 6] <- 0
(m <- as(m, "dgTMatrix"))
m
#4 x 7 sparse Matrix of class "dgTMatrix"
#
#[1,] 1 .  9 .  .  .  .
#[2,] 2 . 10 .  .  .  .
#[3,] . .  . . 19  . 27
#[4,] 4 . 12 .  . 24  .

threshold <- 5
ind <- m@x <= threshold
m@x <- m@x[!ind]
m@i <- m@i[!ind]
m@j <- m@j[!ind]
m
#4 x 7 sparse Matrix of class "dgTMatrix"
#
#[1,] . .  9 .  .  .  .
#[2,] . . 10 .  .  .  .
#[3,] . .  . . 19  . 27
#[4,] . . 12 .  . 24  .

ind向量只需要RAM.如果要避免这种情况,则需要一个循环(可能在Rcpp中是为了提高性能).

You only need the RAM for the ind vector. If you want to avoid that, you need a loop (probably in Rcpp for performance).

这篇关于R矩阵.将稀疏矩阵的特定元素设置为零.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆