更快的版本combn [英] Faster version of combn

查看:322
本文介绍了更快的版本combn的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法加快 combn 命令以获取从向量中取出的2个元素的所有唯一组合?



通常这样设置:

 #Get最新版本的data.table 
库(devtools)
install_github(Rdatatable / data.table,build_vignettes = FALSE)
库(data.table)

#玩具数据
d < - data.table(id = as.character(paste0(A,10001:15000)))

#转换数据
system.time ({
d.1< - as.data.table(t(combn(d $ id,2)))
})
combn 比计算所有可能的组合慢10倍(23秒对我的计算机为3秒) data.table.

  system.time({
d.2 < - d [ d $ id [-which(d $ id == id)]),by = c(id)]
})

处理非常大的向量,我正在寻找一种通过计算唯一组合(如 combn )来节省内存的方法,但与data.table的速度(见第二个代码片段)。



感谢您的帮助。

解决方案

href =http://www.inside-r.org/packages/cran/gRbase/docs/combnPrim =noreferrer> combnPrim gRbase

  source(http://bioconductor.org/biocLite.R)
biocLite(gRbase )#将自动安装依赖包。
system.time({
d.1< - as.data.table(t(combn(d $ id,2)))
})
# elapsed
#27.322 0.585 27.674

system.time({
d.2< - as.data.table(t(combnPrim(d $ id,2)))
})
#用户系统已过
#2.317 0.110 2.425

相同(d.1 [order(V1,V2),],d.2 [order (V1,V2),])
#[1] TRUE


Is there a way to speed up the combn command to get all unique combinations of 2 elements taken from a vector?

Usually this would be set up like this:

# Get latest version of data.table
library(devtools)
install_github("Rdatatable/data.table",  build_vignettes = FALSE)  
library(data.table)

# Toy data
d <- data.table(id=as.character(paste0("A", 10001:15000))) 

# Transform data 
system.time({
d.1 <- as.data.table(t(combn(d$id, 2)))
})

However, combn is 10 times slower (23sec versus 3 sec on my computer) than calculating all possible combinations using data.table.

system.time({
d.2 <- d[, list(neighbor=d$id[-which(d$id==id)]), by=c("id")]
})

Dealing with very large vectors, I am searching for a way to save memory by only calculating the unique combinations (like combn), but with the speed of data.table (see second code snippet).

I appreciate any help.

解决方案

You could use combnPrim from gRbase

source("http://bioconductor.org/biocLite.R")
biocLite("gRbase") # will install dependent packages automatically.
system.time({
 d.1 <- as.data.table(t(combn(d$id, 2)))
 })
#   user  system elapsed 
# 27.322   0.585  27.674 

system.time({
d.2 <- as.data.table(t(combnPrim(d$id,2)))
 })
#   user  system elapsed 
#  2.317   0.110   2.425 

identical(d.1[order(V1, V2),], d.2[order(V1,V2),])
#[1] TRUE

这篇关于更快的版本combn的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆