R-数据帧-转换为稀疏矩阵 [英] R - data frame - convert to sparse matrix
问题描述
我有一个数据帧,大部分为零(稀疏数据帧?),类似于
I have a data frame which is mostly zeros (sparse data frame?) something similar to
name,factor_1,factor_2,factor_3
ABC,1,0,0
DEF,0,1,0
GHI,0,0,1
实际数据约为90,000行,具有10,000个功能.我可以将其转换为稀疏矩阵吗?我期望通过使用稀疏矩阵而不是数据帧来提高时间和空间效率.
The actual data is about 90,000 rows with 10,000 features. Can I convert this to a sparse matrix? I am expecting to gain time and space efficiencies by utilizing a sparse matrix instead of a data frame.
任何帮助将不胜感激
更新1:这是一些用于生成数据帧的代码.感谢Richard提供了此
Update #1: Here is some code to generate the data frame. Thanks Richard for providing this
x <- structure(list(name = structure(1:3, .Label = c("ABC", "DEF", "GHI"),
class = "factor"),
factor_1 = c(1L, 0L, 0L),
factor_2 = c(0L,1L, 0L),
factor_3 = c(0L, 0L, 1L)),
.Names = c("name", "factor_1","factor_2", "factor_3"),
class = "data.frame",
row.names = c(NA,-3L))
推荐答案
避免将所有数据复制到密集矩阵中可能会提高内存效率(但速度较慢):
It might be a bit more memory efficient (but slower) to avoid copying all the data into a dense matrix:
y <- Reduce(cbind2, lapply(x[,-1], Matrix, sparse = TRUE))
rownames(y) <- x[,1]
#3 x 3 sparse Matrix of class "dgCMatrix"
#
#ABC 1 . .
#DEF . 1 .
#GHI . . 1
如果您有足够的内存,则应使用Richard的答案,即,将data.frame转换为密集矩阵,然后使用Matrix
.
If you have sufficient memory you should use Richard's answer, i.e., turn your data.frame into a dense matrix and than use Matrix
.
这篇关于R-数据帧-转换为稀疏矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!