R:稀疏矩阵转换 [英] R: sparse matrix conversion

查看:131
本文介绍了R:稀疏矩阵转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个因子矩阵,并希望将其转换为每个因子所有可能水平的虚拟变量0-1矩阵.

但是,此虚拟"矩阵非常大(91690x16593)并且非常稀疏.我需要将其存储在一个稀疏的矩阵中,否则它不适合我的12GB RAM.

当前,我正在使用以下代码,它可以很好地工作并且需要几秒钟:

library(Matrix)
X_factors <- data.frame(lapply(my_matrix, as.factor))
#encode factor data in a sparse matrix
X <- sparse.model.matrix(~.-1, data = X_factors)

但是,我想在R中使用e1071软件包,并最终使用write.matrix.csr()将该矩阵保存为libsvm格式,因此首先我需要将稀疏矩阵转换为 SparseM 格式./p>

我试图做:

library(SparseM)  
X2 <- as.matrix.csr(X)

但是它很快就填满了我的RAM,最终R崩溃了.我怀疑在内部,as.matrix.csr首先将稀疏矩阵转换为计算机内存无法容纳的密集矩阵.

我的另一种选择是直接以SparseM格式创建稀疏矩阵.
我尝试了as.matrix.csr(X_factors),但是它不接受因子的数据框.

SparseM软件包中的sparse.model.matrix(~.-1, data = X_factors)是否等效?我在文档中进行了搜索,但没有找到.

解决方案

挺棘手的,但我想我明白了.

让我们从Matrix程序包中的稀疏矩阵开始:

i <- c(1,3:8)
j <- c(2,9,6:10)
x <- 7 * (1:7)
X <- sparseMatrix(i, j, x = x)

Matrix软件包使用面向列的压缩格式,而SparseM支持列和行两种格式,并且具有可以轻松处理从一种格式到另一种格式的转换的功能.

因此,我们首先将面向列的Matrix转换为面向列SparseM的矩阵:我们只需要谨慎地调用正确的构造函数,并注意两个包对索引使用不同的约定(从1):

X.csc <- new("matrix.csc", ra = X@x,
                           ja = X@i + 1L,
                           ia = X@p + 1L,
                           dimension = X@Dim)

然后,从面向列的格式更改为面向行的格式:

X.csr <- as.matrix.csr(X.csc)

您完成了!您可以通过执行以下操作来检查两个矩阵是否相同(在我的小示例中):

range(as.matrix(X) - as.matrix(X.csc))
# [1] 0 0

I have a matrix of factors in R and want to convert it to a matrix of dummy variables 0-1 for all possible levels of each factors.

However this "dummy" matrix is very large (91690x16593) and very sparse. I need to store it in a sparse matrix, otherwise it does not fit in my 12GB of ram.

Currently, I am using the following code and it works very fine and takes seconds:

library(Matrix)
X_factors <- data.frame(lapply(my_matrix, as.factor))
#encode factor data in a sparse matrix
X <- sparse.model.matrix(~.-1, data = X_factors)

However, I want to use the e1071 package in R, and eventually save this matrix to libsvm format with write.matrix.csr(), so first I need to convert my sparse matrix to the SparseM format.

I tried to do:

library(SparseM)  
X2 <- as.matrix.csr(X)

but it very quickly fills my RAM and eventually R crashes. I suspect that internally, as.matrix.csr first converts the sparse matrix to a dense matrix that does not fit in my computer memory.

My other alternative would be to create my sparse matrix directly in the SparseM format.
I tried as.matrix.csr(X_factors) but it does not accept a data-frame of factors.

Is there an equivalent to sparse.model.matrix(~.-1, data = X_factors) in the SparseM package? I searched in the documentation but I did not find.

解决方案

Quite tricky but I think I got it.

Let's start with a sparse matrix from the Matrix package:

i <- c(1,3:8)
j <- c(2,9,6:10)
x <- 7 * (1:7)
X <- sparseMatrix(i, j, x = x)

The Matrix package uses a column-oriented compression format, while SparseM supports both column and row oriented formats and has functions that can easily handle the conversion from one format to the other.

So we will first convert our column-oriented Matrix into a column-oriented SparseM matrix: we just need to be careful calling the right constructor and noticing that both packages use different conventions for indices (start at 0 or 1):

X.csc <- new("matrix.csc", ra = X@x,
                           ja = X@i + 1L,
                           ia = X@p + 1L,
                           dimension = X@Dim)

Then, change from column-oriented to row-oriented format:

X.csr <- as.matrix.csr(X.csc)

And you're done! You can check that the two matrices are identical (on my small example) by doing:

range(as.matrix(X) - as.matrix(X.csc))
# [1] 0 0

这篇关于R:稀疏矩阵转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆