R真的很慢矩阵/ data.frame索引选择 [英] R really slow matrix / data.frame index selection

查看:173
本文介绍了R真的很慢矩阵/ data.frame索引选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在选择data.frame g.raw 的子集,如下所示:

I am selecting a subset of a data.frame g.raw, like this:

g.raw <- read.table(gfile,sep=',', header=F, row.names=1) 
snps = intersect(row.names(na.omit(csnp.raw)),row.names(na.omit(esnp.raw))) 
g = g.raw[snps,] 

它可以工作。然而,最后一行是非常缓慢。

It works. However, that last line is EXTREMELY slow.

g.raw 约18M行, snps 约1M。我意识到这些都是相当大的数字,但这似乎是一个简单的操作,并且将g读入内存中的一个矩阵/数据框不是问题(花了几分钟),而我上面描述的这个操作正在采取小时。

g.raw is about 18M rows and snps is about 1M. I realize these are pretty large numbers, but this seems like a simple operation, and reading in g into a matrix/data.frame held in memory wasn't a problem (took a few minutes), whereas this operation I described above is taking hours.

我该如何加速?所有我想要收缩g.raw很多。

How do I speed this up? All I want is to shrink g.raw a lot.

谢谢!

推荐答案

似乎是 data.table 可以闪耀的情况。

It seems to be the case where data.table can shine.

复制 data.frame

set.seed(1)
N <- 1e6    # total number of rows
M <- 1e5    # number of rows to subset

g.raw <- data.frame(sample(1:N, N), sample(1:N, N), sample(1:N, N))
rownames(g.raw) <- sapply(1:N, function(x) paste(sample(letters, 50, replace=T), collapse=""))
snps <- sample(rownames(g.raw), M)

head(g.raw) # looking into newly created data.frame
head(snps)  # and rows for subsetting

数据。框架方法:

system.time(g <- g.raw[snps,])
# >    user  system elapsed 
# > 881.039   0.388 884.821 

data.table

require(data.table)
dt.raw <- as.data.table(g.raw, keep.rownames=T)
# rn is a column with rownames(g.raw)
system.time(setkey(dt.raw, rn))
# >  user  system elapsed 
# > 8.029   0.004   8.046 

system.time(dt <- dt.raw[snps,])
# >  user  system elapsed 
# > 0.428   0.000   0.429 

嗯,这些 N M (甚至更好的加速与更大的 N )。

Well, 100x times faster with these N and M (and even better speed-up with larger N).

您可以比较结果:

head(g)
head(dt)

这篇关于R真的很慢矩阵/ data.frame索引选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆