在多线程遍历迭代之前和之后,如何减少每次花费的时间? [英] How can I reduce the time foreach take before and after multithreadedly going over the iterations?
问题描述
我使用foreach
+ doParallel
将函数应用于R中的矩阵多线程的每一行.当矩阵有很多行时,foreach
前后需要很长时间多线程遍历迭代.
I use foreach
+ doParallel
to apply a function to each row of a matrix multithreadedly in R. When the matrix has many rows, foreach
takes a long time before and after multithreadedly going over the iterations.
例如,如果我运行:
library(foreach)
library(doParallel)
doWork <- function(data) {
# setup parallel backend to use many processors
cores=detectCores()
number_of_cores_to_use = cores[1]-1 # not to overload the computer
cat(paste('number_of_cores_to_use:',number_of_cores_to_use))
cl <- makeCluster(number_of_cores_to_use)
clusterExport(cl=cl, varlist=c('ns','weights'))
registerDoParallel(cl)
cat('...Starting foreach initialization')
output <- foreach(i=1:length(data[,1]), .combine=rbind) %dopar% {
cat(i)
y = data[i,5]
a = 100
for (i in 1:3) { # Useless busy work
b=matrix(runif(a*a), nrow = a, ncol=a)
}
return(runif(10))
}
# stop cluster
cat('...Stop cluster')
stopCluster(cl)
return(output)
}
r = 100000
c = 10
data = matrix(runif(r*c), nrow = r, ncol=c)
output = doWork(data)
output[1:10,]
CPU使用情况如下(100%表示所有内核均已充分利用):
The CPU usage is as follows (100% means all cores are fully utilized):
带有注释:
如何优化代码,以使foreach
在多线程遍历迭代前后不会花费很长时间?主要的时间消耗是花费的时间.后花费的时间随着foreach迭代次数的增加而显着增加,有时会使代码变慢,就好像使用了简单的for循环一样.
How can I optimize the code so that foreach
doesn't take a long time before and after multithreadedly going over the iterations? The main time sink is the time spent after. The time spent after grows significantly with the number of foreach iterations, sometimes making the code has slow as if a simple for loop was used.
另一个示例(假设lm
和poly
不能将矩阵作为参数):
Another example (let's assume lm
and poly
cannot take matrices as arguments):
library(foreach)
library(doParallel)
doWork <- function(data,weights) {
# setup parallel backend to use many processors
cores=detectCores()
number_of_cores_to_use = cores[1]-1 # not to overload the computer
cat(paste('number_of_cores_to_use:',number_of_cores_to_use))
cl <- makeCluster(number_of_cores_to_use)
clusterExport(cl=cl, varlist=c('weights'))
registerDoParallel(cl)
cat('...Starting foreach initialization')
output <- foreach(i=1:nrow(data), .combine=rbind) %dopar% {
x = sort(data[i,])
fit = lm(x[1:(length(x)-1)] ~ poly(x[-1], degree = 2,raw=TRUE), na.action=na.omit, weights=weights)
return(fit$coef)
}
# stop cluster
cat('...Stop cluster')
stopCluster(cl)
return(output)
}
r = 10000
c = 10
weights=runif(c-1)
data = matrix(runif(r*c), nrow = r, ncol=c)
output = doWork(data,weights)
output[1:10,]
推荐答案
尝试一下:
devtools::install_github("privefl/bigstatsr")
library(bigstatsr)
options(bigstatsr.ncores.max = parallel::detectCores())
doWork2 <- function(data, weights, ncores = parallel::detectCores() - 1) {
big_parallelize(data, p.FUN = function(X.desc, ind, weights) {
X <- bigstatsr::attach.BM(X.desc)
output.part <- matrix(0, 3, length(ind))
for (i in seq_along(ind)) {
x <- sort(X[, ind[i]])
fit <- lm(x[1:(length(x)-1)] ~ poly(x[-1], degree = 2, raw = TRUE),
na.action = na.omit, weights = weights)
output.part[, i] <- fit$coef
}
t(output.part)
}, p.combine = "rbind", ncores = ncores, weights = weights)
}
system.time({
data.bm <- as.big.matrix(t(data))
output2 <- doWork2(data.bm, weights)
})
all.equal(output, output2, check.attributes = FALSE)
这是我的计算机(只有4个内核)的两倍速度.备注:
This is twice as fast on my computer (which has only 4 cores). Remarks:
- 使用多于一半的内核通常是无用的.
- 您的数据不是很大,因此在这里使用
big.matrix
可能没有用. -
big_parallelize
在ncores
列块中分离矩阵,并对每个块应用函数,然后合并结果. - 在函数中,最好在循环之前生成输出,然后填充它,而不是使用
rbind
表示所有结果的结果. - 我仅访问列,而不访问行.
- Using more than half of the cores is often useless.
- Your data is not very large, so using a
big.matrix
may not be useful here. big_parallelize
separate the matrix inncores
blocks of columns and apply your function on each and then combine the results.- In the function, it's better to make the output before the loop, and then fill it than to use a
foreach
thatrbind
all the results. - I'm accessing only columns, not rows.
因此,所有这些都是好的做法,但这与您的数据并不真正相关.当使用更多核心和更大数据集时,增益应该更高.
So all these are good practices, yet it is not really relevant for your data. The gain should be higher when using more cores and for larger datasets.
基本上,如果您想超快,那么在Rcpp中重新实现lm
部分将是一个很好的解决方案.
Basically, if you want to be super fast, reimplementing the lm
part in Rcpp would be a good solution.
这篇关于在多线程遍历迭代之前和之后,如何减少每次花费的时间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!