R将函数调用分配给两个不同的内核 [英] R assigning function call to two different cores

查看:86
本文介绍了R将函数调用分配给两个不同的内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

到目前为止,我所读到的有关R中并行处理的所有内容都涉及查看一个数据帧的多行.

So far, all I've read about parallel processing in R involves looking at multiple rows of one dataframe.

但是,如果我有两个或三个要执行长功能的大型数据帧,该怎么办?我可以将函数的每个实例分配给特定的内核,这样我就不必等待其顺序工作了吗?我在窗户上.

But what if I have 2 or three large dataframes that I want to perform a long function on? Can I assign each instance of the function to a specific core so I don't have to wait for it to work sequentially? I'm on windows.

可以说这是函数:

AltAlleleRecounter <- function(names,data){
data$AC <- 0
numalleles <- numeric(length=nrow(data))
for(i in names){
    genotype <- str_extract(data[,i],"^[^/]/[^/]")
    GT <- dstrfw(genotype,c('character','character','character'),c(1L,1L,1L))
    data[GT$V1!='.',]$AC <- data[GT$V1!='.',]$AC+GT[GT$V1!='.',]$V1+GT[GT$V1!='.',]$V3
    numalleles[GT$V1!='.'] <- numalleles[GT$V1!='.'] + 2
}
data$AF <- data$AC/numalleles
return(data)
}

我想做的基本上是这样的(通用伪代码):

What I want to do is basically this (generic psuedocode):

wait_till_everything_is_finished(
core1="data1 <- AltAlleleRecounter(sampleset1,data1,1)",
core2="data2 <- AltAlleleRecounter(sampleset2,data2,2)",
core3="data3 <- AltAlleleRecounter(sampleset3,data3,3)"
)

所有三个命令都在其中运行,但是程序直到完成所有操作后才继续运行.

where all three commands are running but the program doesn't progress until everything is done.

布莱恩的建议奏效了.我用第二个列表替换了"otherList".这是示例代码:

Bryan's suggestion worked. I replaced "otherList" with my second list. This is example code:

myframelist <- list(data1,data2)
mynameslist <- list(names1,names2)
myframelist <- foreach(i=1:2) %dopar% (AltAlleleRecounter(mynameslist[[i]],myframelist[[i]]))
myfilenamelist <- list("data1.tsv","data2.tsv")
foreach(i=1:2) %dopar% (write.table(myframelist[[i]], file=myfilenamelist[[i]], quote=FALSE, sep="\t", row.names=FALSE, col.names=TRUE))

数据变量是数据帧,名称变量只是字符向量.您可能需要重新加载一些软件包.

The data variables are dataframes and the name variables are just character vectors. You may need to reload some packages.

推荐答案

尝试如下操作:

library(doParallel)
library(foreach)

cl<-makeCluster(6) ## you can set up as many cores as you need/want/have here. 
registerDoParallel(cl)
getDoParWorkers() # should be the number you registered. If not, something went wrong.

df1<-data.frame(matrix(1:9, ncol = 3))
df2<-data.frame(matrix(1:9, ncol = 3))
df3<-data.frame(matrix(1:9, ncol = 3))
mylist<-list(df1, df2, df3)

otherList<-list(1, 2, 3)

mylist<-foreach(i=1:3) %dopar% (mylist[[i]] * otherList[[i]])
mylist

[[1]]
X1 X2 X3
1  4  7
2  5  8
3  6  9

[[2]]
X1 X2 X3
2  8 14
4 10 16
6 12 18

[[3]]
X1 X2 X3
3 12 21
6 15 24
9 18 27

我经常通过主题建模不同的数据库来做到这一点.这个想法是创建要应用函数的数据的列表,然后让foreach并行将函数应用到那些索引列表.对于您的示例,您将必须列出您的data.frames和另一个样本集列表.

I do this fairly often with topic modeling different databases. The idea is to create lists of the data you want to apply your function to, then have foreach apply your function to those indexed lists in parallel. For your example you will have to make a list of your data.frames and another list of your samplesets.

这篇关于R将函数调用分配给两个不同的内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆