R:稀疏矩阵转换 [英] R: sparse matrix conversion
问题描述
我在R中有一个因子矩阵,并希望将其转换为每个因子所有可能水平的虚拟变量0-1矩阵.
但是,此虚拟"矩阵非常大(91690x16593)并且非常稀疏.我需要将其存储在一个稀疏的矩阵中,否则它不适合我的12GB RAM.
当前,我正在使用以下代码,它可以很好地工作并且需要几秒钟:
library(Matrix)
X_factors <- data.frame(lapply(my_matrix, as.factor))
#encode factor data in a sparse matrix
X <- sparse.model.matrix(~.-1, data = X_factors)
但是,我想在R中使用e1071软件包,并最终使用write.matrix.csr()
将该矩阵保存为libsvm格式,因此首先我需要将稀疏矩阵转换为 SparseM 格式./p>
我试图做:
library(SparseM)
X2 <- as.matrix.csr(X)
但是它很快就填满了我的RAM,最终R崩溃了.我怀疑在内部,as.matrix.csr
首先将稀疏矩阵转换为计算机内存无法容纳的密集矩阵.
我的另一种选择是直接以SparseM格式创建稀疏矩阵.
我尝试了as.matrix.csr(X_factors)
,但是它不接受因子的数据框.
SparseM软件包中的sparse.model.matrix(~.-1, data = X_factors)
是否等效?我在文档中进行了搜索,但没有找到.
挺棘手的,但我想我明白了.
让我们从Matrix
程序包中的稀疏矩阵开始:
i <- c(1,3:8)
j <- c(2,9,6:10)
x <- 7 * (1:7)
X <- sparseMatrix(i, j, x = x)
Matrix
软件包使用面向列的压缩格式,而SparseM
支持列和行两种格式,并且具有可以轻松处理从一种格式到另一种格式的转换的功能.
因此,我们首先将面向列的Matrix
转换为面向列SparseM
的矩阵:我们只需要谨慎地调用正确的构造函数,并注意两个包对索引使用不同的约定(从1
):
X.csc <- new("matrix.csc", ra = X@x,
ja = X@i + 1L,
ia = X@p + 1L,
dimension = X@Dim)
然后,从面向列的格式更改为面向行的格式:
X.csr <- as.matrix.csr(X.csc)
您完成了!您可以通过执行以下操作来检查两个矩阵是否相同(在我的小示例中):
range(as.matrix(X) - as.matrix(X.csc))
# [1] 0 0
I have a matrix of factors in R and want to convert it to a matrix of dummy variables 0-1 for all possible levels of each factors.
However this "dummy" matrix is very large (91690x16593) and very sparse. I need to store it in a sparse matrix, otherwise it does not fit in my 12GB of ram.
Currently, I am using the following code and it works very fine and takes seconds:
library(Matrix)
X_factors <- data.frame(lapply(my_matrix, as.factor))
#encode factor data in a sparse matrix
X <- sparse.model.matrix(~.-1, data = X_factors)
However, I want to use the e1071 package in R, and eventually save this matrix to libsvm format with write.matrix.csr()
, so first I need to convert my sparse matrix to the SparseM format.
I tried to do:
library(SparseM)
X2 <- as.matrix.csr(X)
but it very quickly fills my RAM and eventually R crashes. I suspect that internally, as.matrix.csr
first converts the sparse matrix to a dense matrix that does not fit in my computer memory.
My other alternative would be to create my sparse matrix directly in the SparseM format.
I tried as.matrix.csr(X_factors)
but it does not accept a data-frame of factors.
Is there an equivalent to sparse.model.matrix(~.-1, data = X_factors)
in the SparseM package? I searched in the documentation but I did not find.
Quite tricky but I think I got it.
Let's start with a sparse matrix from the Matrix
package:
i <- c(1,3:8)
j <- c(2,9,6:10)
x <- 7 * (1:7)
X <- sparseMatrix(i, j, x = x)
The Matrix
package uses a column-oriented compression format, while SparseM
supports both column and row oriented formats and has functions that can easily handle the conversion from one format to the other.
So we will first convert our column-oriented Matrix
into a column-oriented SparseM
matrix: we just need to be careful calling the right constructor and noticing that both packages use different conventions for indices (start at 0
or 1
):
X.csc <- new("matrix.csc", ra = X@x,
ja = X@i + 1L,
ia = X@p + 1L,
dimension = X@Dim)
Then, change from column-oriented to row-oriented format:
X.csr <- as.matrix.csr(X.csc)
And you're done! You can check that the two matrices are identical (on my small example) by doing:
range(as.matrix(X) - as.matrix(X.csc))
# [1] 0 0
这篇关于R:稀疏矩阵转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!