R-按列名称将BIG表转换为矩阵 [英] R - convert BIG table into matrix by column names
问题描述
这是对现有问题的扩展:将表转换为矩阵按列名
This is an extension to an existing question: Convert table into matrix by column names
我使用的是最终答案: https://stackoverflow.com/a/2133898/1287275
I am using the final answer: https://stackoverflow.com/a/2133898/1287275
原始CSV文件矩阵大约有150万行,其中包含三列...行索引,列索引和一个值.所有数字均为长整数.底层矩阵是一个大小约为220K x 220K的稀疏矩阵,平均每行约7个值.
The original CSV file matrix has about 1.5M rows with three columns ... row index, column index, and a value. All numbers are long integers. The underlying matrix is a sparse matrix about 220K x 220K in size with an average of about 7 values per row.
原始的read.table工作正常.
The original read.table works just fine.
x <- read.table("/users/wallace/Hadoop_Local/reference/DiscoveryData6Mo.csv", header=TRUE);
当我执行reshape命令时,我的问题来了.
My problem comes when I do the reshape command.
reshape(x, idvar="page_id", timevar="reco", direction="wide")
CPU命中率100%,它永远存在.机器(Mac)的内存比R正在使用的内存更多.我不明白为什么构造稀疏矩阵要花这么长时间.
The CPU hits 100% and there it sits forever. The machine (a mac) has more memory than R is using. I don't see why it should take so long to construct a sparse matrix.
我正在使用默认的矩阵软件包.我没有安装任何额外的东西.我几天前才下载R,所以我应该拥有最新版本.
I am using the default matrix package. I haven't installed anything extra. I just downloaded R a few days ago, so I should have the latest version.
建议?
谢谢, 华莱士
推荐答案
我将使用Matrix
包中的sparseMatrix
函数.典型用法是sparseMatrix(i, j, x)
,其中i
,j
和x
是长度相同的三个向量:分别是行索引,col索引和矩阵中非零元素的值.这是一个示例,我尝试将变量名称和尺寸与您的规范相匹配:
I would use the sparseMatrix
function from the Matrix
package. The typical usage is sparseMatrix(i, j, x)
where i
, j
, and x
are three vectors of same length: respectively, the row indices, col indices, and values of the non-zero elements in the matrix. Here is an example where I have tried to match variable names and dimensions to your specifications:
num.pages <- 220000
num.recos <- 230000
N <- 1500000
df <- data.frame(page_id = sample.int(num.pages, N, replace=TRUE),
reco = sample.int(num.recos, N, replace=TRUE),
value = runif(N))
head(df)
# page_id reco value
# 1 33688 48648 0.3141030
# 2 78750 188489 0.5591290
# 3 158870 13157 0.2249552
# 4 38492 56856 0.1664589
# 5 70338 138006 0.7575681
# 6 160827 68844 0.8375410
library("Matrix")
mat <- sparseMatrix(i = df$page_id,
j = df$reco,
x = df$value,
dims = c(num.pages, num.recos))
这篇关于R-按列名称将BIG表转换为矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!