从foreach循环中赋值 [英] Assignment of a value from a foreach loop
问题描述
我想并行化一个循环,如
td < - data.frame(cbind(c(rep(1 ,4),2,rep(1,5)),rep(1:10,2)))
名称(td)< -c(val,id)
$ (b)(b)(b)(b)(b)(b)(d)(b) - 意思是(td $ val [td $ id!= i])
}
在库(doParallel)的 foreach()的帮助下,以加速计算。不幸的是,foreach似乎不支持直接分配,至少是
registerDoParallel(4)
res< - rep (n = NROW(td))
foreach(i = levels(interaction(td $ id)))%dopar%{
res [td $ id == i]< - mean(td $ val [td $ id!= i])}
与上面的正常循环相同的结果)。任何想法我做错了什么,或者我怎么可能破解在foreach的 .combine 选项为了做我想要的?请注意,id变量的顺序在原始数据集中并不总是相同的。任何暗示将非常感激!
解决方案
如果你使用data.table而不是一个循环的并行化,你的性能增益将会好几个数量级:
$ p $ library(data.table)
DT < - data.table(td)
DT [,means:= mean(DT [ - 。I,val]),by = id]
identical(DT $ means,res)
#[1] TRUE
如果您要使用 foreach
需要将它与 merge
合并:
$ b
library(foreach)$ (t = $ id)),.combine = rbind)%do%{
data.frame(level = i,means = mean(td $ val [ td $ id!= i]))}
res2 < - merge(res2,td,by.x =level,by.y =id,sort = FALSE)
#level表示val
#1 1 1.111111 1
#2 1 1.111111 1
#3 2 1.111111 1
#4 2 1.111111 1
#5 3 1.111111 1
#6 3 1.111111 1
#7 4 1.111111 1
# 8 4 1.111111 1
#9 5 1.000000 2
#10 5 1.000000 2
#11 6 1.111111 1
#12 6 1.111111 1
#13 7 1.111111 1
#14 7 1.111111 1
#15 8 1.111111 1
#16 8 1.111111 1
#17 9 1.111111 1
#18 9 1.111111 1
#19 10 1.111111 1
#20 10 1.111111 1
I would like to parallelize a loop like
td <- data.frame(cbind(c(rep(1,4),2,rep(1,5)),rep(1:10,2)))
names(td) <- c("val","id")
res <- rep(NA,NROW(td))
for(i in levels(interaction(td$id))){
res[td$id==i] <- mean(td$val[td$id!=i])
}
with the help of foreach() of the library(doParallel) in order to speed up computations. Unfortunately foreach doesn't seem to support direct assignments, at least
registerDoParallel(4)
res <- rep(NA,NROW(td))
foreach(i=levels(interaction(td$id))) %dopar%{
res[td$id==i] <- mean(td$val[td$id!=i])}
doesn't do what I want (give the same result as the normal loop above). Any ideas what I am doing wrong or how I could somehow "hack" the .combine option in foreach in order to do what I want? Please note that the order of the id variable is not always the same in the original data set. Any hint would be very much appreciated!
Your performance gain will be better by orders of magnitude if you use data.table for this instead of parallelization of a loop:
library(data.table)
DT <- data.table(td)
DT[, means := mean(DT[-.I, val]), by = id]
identical(DT$means, res)
#[1] TRUE
If you want to use foreach
you'll need to combine it with a merge
:
library(foreach)
res2 <- foreach(i=levels(interaction(td$id)), .combine=rbind) %do% {
data.frame(level = i, means = mean(td$val[td$id!=i]))}
res2 <- merge(res2, td, by.x = "level", by.y = "id", sort = FALSE)
# level means val
# 1 1 1.111111 1
# 2 1 1.111111 1
# 3 2 1.111111 1
# 4 2 1.111111 1
# 5 3 1.111111 1
# 6 3 1.111111 1
# 7 4 1.111111 1
# 8 4 1.111111 1
# 9 5 1.000000 2
# 10 5 1.000000 2
# 11 6 1.111111 1
# 12 6 1.111111 1
# 13 7 1.111111 1
# 14 7 1.111111 1
# 15 8 1.111111 1
# 16 8 1.111111 1
# 17 9 1.111111 1
# 18 9 1.111111 1
# 19 10 1.111111 1
# 20 10 1.111111 1
这篇关于从foreach循环中赋值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!