更快的版本combn [英] Faster version of combn
问题描述
有没有办法加快 combn
命令以获取从向量中取出的2个元素的所有唯一组合?
通常这样设置:
#Get最新版本的data.table
$然而,
库(devtools)
install_github(Rdatatable / data.table,build_vignettes = FALSE)
库(data.table)
#玩具数据
d < - data.table(id = as.character(paste0(A,10001:15000)))
#转换数据
system.time ({
d.1< - as.data.table(t(combn(d $ id,2)))
})
combn
比计算所有可能的组合慢10倍(23秒对我的计算机为3秒) data.table.system.time({
d.2 < - d [ d $ id [-which(d $ id == id)]),by = c(id)]
})
处理非常大的向量,我正在寻找一种通过计算唯一组合(如
combn
)来节省内存的方法,但与data.table的速度(见第二个代码片段)。
感谢您的帮助。
解决方案href =http://www.inside-r.org/packages/cran/gRbase/docs/combnPrim =noreferrer>
combnPrim
从gRbase
source(http://bioconductor.org/biocLite.R)
biocLite(gRbase )#将自动安装依赖包。
system.time({
d.1< - as.data.table(t(combn(d $ id,2)))
})
# elapsed
#27.322 0.585 27.674
system.time({
d.2< - as.data.table(t(combnPrim(d $ id,2)))
})
#用户系统已过
#2.317 0.110 2.425
相同(d.1 [order(V1,V2),],d.2 [order (V1,V2),])
#[1] TRUE
Is there a way to speed up the
combn
command to get all unique combinations of 2 elements taken from a vector?Usually this would be set up like this:
# Get latest version of data.table library(devtools) install_github("Rdatatable/data.table", build_vignettes = FALSE) library(data.table) # Toy data d <- data.table(id=as.character(paste0("A", 10001:15000))) # Transform data system.time({ d.1 <- as.data.table(t(combn(d$id, 2))) })
However,
combn
is 10 times slower (23sec versus 3 sec on my computer) than calculating all possible combinations using data.table.system.time({ d.2 <- d[, list(neighbor=d$id[-which(d$id==id)]), by=c("id")] })
Dealing with very large vectors, I am searching for a way to save memory by only calculating the unique combinations (like
combn
), but with the speed of data.table (see second code snippet).I appreciate any help.
解决方案You could use
combnPrim
fromgRbase
source("http://bioconductor.org/biocLite.R") biocLite("gRbase") # will install dependent packages automatically. system.time({ d.1 <- as.data.table(t(combn(d$id, 2))) }) # user system elapsed # 27.322 0.585 27.674 system.time({ d.2 <- as.data.table(t(combnPrim(d$id,2))) }) # user system elapsed # 2.317 0.110 2.425 identical(d.1[order(V1, V2),], d.2[order(V1,V2),]) #[1] TRUE
这篇关于更快的版本combn的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!