R中的稀疏(dgCMatrix)矩阵行标准化 [英] Sparse (dgCMatrix) matrix row-normalization in R

查看:1439
本文介绍了R中的稀疏(dgCMatrix)矩阵行标准化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的稀疏矩阵,称为P:

I have a large sparse matrix, call it P:

 > str(P)
   Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
   ..@ i       : int [1:7868093] 4221 6098 8780 10313 11102 14243 20570 22145 24468 24977 ...
   ..@ p       : int [1:7357] 0 0 269 388 692 2434 3662 4179 4205 4256 ...
   ..@ Dim     : int [1:2] 1303967 7356
   ..@ Dimnames:List of 2
   .. ..$ : NULL
   .. ..$ : NULL
   ..@ x       : num [1:7868093] 1 1 1 1 1 1 1 1 1 1 ...
   ..@ factors : list()

我想对行进行归一化(例如,使用L-2范数)...(利用矢量循环),直接方法将类似于:

I'd like to row-normalize (say, with the L-2 norm)... (taking advantage of vector-recycling) the straight-forward approach would be something like:

> row_normalized_P <- P / rowSums(P^2)

但这会导致内存分配错误,因为看来rowSums结果正在被回收到尺寸等于dim(P)密集矩阵中. 假设已知P是稀疏的(或者至少以稀疏格式存储),那么有人知道采用非迭代方法来实现上面所示的所需row_normalized_P吗? (即,结果矩阵将与P本身一样稀疏...并且我想避免在归一化步骤中分配密集的矩阵.)

But this causes a memory allocation error, since it appears the rowSums result is being recycled into a dense matrix with dimensions equal to dim(P). Given that P is known to be sparse (or at the very least is stored in sparse format), does anyone know of a non-iterative approach to achieve the desired row_normalized_P shown above? (I.e. the resultant matrix will be equally sparse as P itself... and I'd like to avoid ever having a dense matrix allocated during the normalization steps.)

我发现的唯一半有效方法是跨apply跨行(更准确地是通过将行块强制转换为密集的子矩阵)的P,但是我想尝试删除如果可以的话,可以从我的代码库中获取循环逻辑,我想知道Matrix包中是否有内置的(我只是不知道)可以帮助这种特殊类型的计算.

The only semi-efficient method I've found around this is to apply across rows (more accurately through blocks of rows coerced into dense sub-matrices) of P, but I'd like to try to remove the looping logic from my codebase if I can, and I'm wondering if perhaps there's a built-in in the Matrix package (that I'm just not aware of) that helps with this particular type of computation.

干杯,谢谢您的帮助!

-村寨

推荐答案

我想出了一个不错的解决方案(通常,在发布:-/后约15分钟)...

I figured out a nice solution (as usual, about 15 minutes after posting :-/ )...

> row_normalized_P <- Matrix::Diagonal(x = 1 / sqrt(Matrix::rowSums(P^2))) %*% P

这篇关于R中的稀疏(dgCMatrix)矩阵行标准化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆