通过行名将稀疏模型矩阵绑定在一起 [英] bind together sparse model matrices by row names

查看:121
本文介绍了通过行名将稀疏模型矩阵绑定在一起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过拆分应用组合方法构造一个大型稀疏矩阵,方法是从数据包的列子集上的包Matrix分别调用sparse.model.matrix(),然后将它们绑定到一个完整的矩阵中.由于内存限制,我必须这样做(我无法一次在整个df上调用sparse.model.matrix).此过程运行良好,并且我得到了稀疏矩阵的列表,但是它们的维数不同,当我尝试将它们绑定在一起时,我做不到.

例如:

data(iris)
set.seed(100)
iris$v6 <- sample(c("a","b","c",NA), 150, replace=TRUE)
iris$v7 <- sample(c("x","y",NA), 150, replace = TRUE)

sparse_m1 <- sparse.model.matrix(~., iris[,1:5])
sparse_m2 <- sparse.model.matrix(~.-1, iris[, 6:7])

dim(sparse_m1)
[1] 150   7

dim(sparse_m2)
[1] 71  4

cbind2(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(sparse_m1, sparse_m2)

cbind(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(..1, r)

这些矩阵具有相同的行名,只是从sparse_m2中省略了一些行,因为它们在两列中都缺少值.有什么办法可以将它们结合起来?

我还尝试使用plyr包中的rbind.fill.matrix(),方法是先转置然后调用它,然后重新转置,但是随后我丢失了列名,因为在rbind.fill.matrix中行名被忽略了. >

有什么想法吗?

解决方案

最近遇到了同一问题,如今您可以

install.packages("Matrix.utils")
library(Matrix.utils)
sparse_filled <- rBind.fill(sparse_m1, sparse_m2)

I am trying to construct a large sparse matrix with a split-apply-combine approach by separately calling sparse.model.matrix() from the package Matrix on subsets of columns of a dataframe and then binding them together into a full matrix. I have to do this because of memory limitations (I can't call sparse.model.matrix on the whole df at once). This process works fine, and I get a list of sparse matrices, but these have different dimensions and when I try to bind them together, I can't.

ex:

data(iris)
set.seed(100)
iris$v6 <- sample(c("a","b","c",NA), 150, replace=TRUE)
iris$v7 <- sample(c("x","y",NA), 150, replace = TRUE)

sparse_m1 <- sparse.model.matrix(~., iris[,1:5])
sparse_m2 <- sparse.model.matrix(~.-1, iris[, 6:7])

dim(sparse_m1)
[1] 150   7

dim(sparse_m2)
[1] 71  4

cbind2(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(sparse_m1, sparse_m2)

cbind(sparse_m1, sparse_m2)
Error: Matrices must have same number of rows in cbind2(..1, r)

The matrices have the same row names, just some rows have been omitted from sparse_m2 because they had missing values in both columns. Is there any way to combine them?

I also tried using rbind.fill.matrix() from the plyr package, by first transposing and then calling it and then re-transposing, but then I lose column names since row names are ignored in rbind.fill.matrix.

Any ideas?

解决方案

recently bumped in the same issue, and nowadays you can

install.packages("Matrix.utils")
library(Matrix.utils)
sparse_filled <- rBind.fill(sparse_m1, sparse_m2)

这篇关于通过行名将稀疏模型矩阵绑定在一起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆