R-数据帧-转换为稀疏矩阵 [英] R - data frame - convert to sparse matrix

查看:280
本文介绍了R-数据帧-转换为稀疏矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧,大部分为零(稀疏数据帧?),类似于

I have a data frame which is mostly zeros (sparse data frame?) something similar to

name,factor_1,factor_2,factor_3
ABC,1,0,0
DEF,0,1,0
GHI,0,0,1

实际数据约为90,000行,具有10,000个功能.我可以将其转换为稀疏矩阵吗?我期望通过使用稀疏矩阵而不是数据帧来提高时间和空间效率.

The actual data is about 90,000 rows with 10,000 features. Can I convert this to a sparse matrix? I am expecting to gain time and space efficiencies by utilizing a sparse matrix instead of a data frame.

任何帮助将不胜感激

更新1:这是一些用于生成数据帧的代码.感谢Richard提供了此

Update #1: Here is some code to generate the data frame. Thanks Richard for providing this

x <- structure(list(name = structure(1:3, .Label = c("ABC", "DEF", "GHI"),
                    class = "factor"), 
               factor_1 = c(1L, 0L, 0L), 
               factor_2 = c(0L,1L, 0L), 
               factor_3 = c(0L, 0L, 1L)), 
               .Names = c("name", "factor_1","factor_2", "factor_3"), 
               class = "data.frame",
               row.names = c(NA,-3L))

推荐答案

避免将所有数据复制到密集矩阵中可能会提高内存效率(但速度较慢):

It might be a bit more memory efficient (but slower) to avoid copying all the data into a dense matrix:

y <- Reduce(cbind2, lapply(x[,-1], Matrix, sparse = TRUE))
rownames(y) <- x[,1]

#3 x 3 sparse Matrix of class "dgCMatrix"
#         
#ABC 1 . .
#DEF . 1 .
#GHI . . 1

如果您有足够的内存,则应使用Richard的答案,即,将data.frame转换为密集矩阵,然后使用Matrix.

If you have sufficient memory you should use Richard's answer, i.e., turn your data.frame into a dense matrix and than use Matrix.

这篇关于R-数据帧-转换为稀疏矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆