R:如何在Linux服务器上使用parallelMap(带有mlr,xgboost)?与Windows相比,性能出乎意料 [英] R: How to use parallelMap (with mlr, xgboost) on linux server? Unexpected performance compared to windows

查看:98
本文介绍了R:如何在Linux服务器上使用parallelMap(带有mlr,xgboost)?与Windows相比,性能出乎意料的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在调整超参数级别上并行化我正在mlr中进行调整的xgboost模型,并尝试与parallelMap并行化.我的代码可以在Windows机器(只有8个内核)上成功运行,并且想使用Linux服务器(有72个内核).我无法成功获得转移到服务器上的任何计算优势,并且我认为这是由于我对parallelMap参数的理解出现了漏洞.

I am trying to parallelize at the tuning hyperparameter level an xgboost model that I am tuning in mlr and am trying to parallelize with parallelMap. I have code that works successfully on my windows machine (with only 8 cores) and would like to make use of a linux server (with 72 cores). I have not been able to successfully gain any computational advantage moving to the server, and I think this is a result of holes in my understanding of the parallelMap parameters.

我不将多核与本地与套接字之间的差异理解为parallelMap中的模式".根据我的阅读,我认为多核将适合我的情况,但我不确定.我在Windows计算机上成功使用了套接字,并在Linux服务器上尝试了套接字和多核,但结果不成功.

I do not understand the differences in multicore vs local vs socket as "modes" in parallelMap. Based on my reading, I think that multicore would work for my situation, but I am not sure. I used socket successfully on my windows machine and have tried both socket and multicore on my linux server, with unsuccessful results.

parallelStart(mode="socket", cpu=8, level="mlr.tuneParams")

但是我的理解是,与并行化超参数调整一样,对于在不需要相互通信的许多内核上进行并行化,套接字可能不是必需的,或者可能很慢.

but it is my understanding that socket might be unnecessary or perhaps slow for parallelizing over many cores that do not need to communicate with each other, as is the case with parallelizing hyperparameter tuning.

要详细说明我在linux服务器上的不成功结果:我没有收到错误,但是连续需要不到24小时的时间却要花费> 2周的时间.查看这些过程,可以发现我确实在使用几个核心.

To elaborate on my unsuccessful results on my linux server: I am not getting errors, but things that would take <24 hours in serial are taking > 2 weeks in parallel. Looking at the processes, I can see that I am indeed using several cores.

每个单独的呼叫xgboost都会在几分钟内运行,我不打算加快速度.我只是想在几个核上调整超参数.

Each individual call xgboost runs in the matter of a few minutes, and I am not trying to speed that up. I am only trying to tune hyperparmeters over several cores.

我担心我的linux服务器上的速度很慢是由于xgboost尝试利用模型构建中的可用内核,因此我通过mlr将nthread = 1馈送到xgboost以确保不会发生这种情况.尽管如此,我的代码在大型Linux服务器上的运行速度似乎比在较小的Windows计算机上的运行速度慢得多-关于可能发生的事情有什么想法吗?

I was concerned that perhaps my very slow results on my linux server were due to attempts by xgboost to make use of the available cores in model building, so I fed nthread = 1 to xgboost via mlr to ensure that does not happen. Nonetheless, my code seems to run much slower on my larger linux server than it does on my smaller windows computer -- any thoughts as to what might be happening?

非常感谢.

xgb_learner_tune <- makeLearner(
  "classif.xgboost",
  predict.type = "response",
  par.vals = list(
    objective = "binary:logistic",
    eval_metric = "map",
    nthread=1))

library(parallelMap)
parallelStart(mode="multicore", cpu=8, level="mlr.tuneParams")

tuned_params_trim <- tuneParams(
  learner = xgb_learner_tune,
  task = trainTask,
  resampling = resample_desc,
  par.set = xgb_params,
  control = control,
  measures = list(ppv, tpr, tnr, mmce)
)
parallelStop()

编辑

我仍然对尝试在调整级别上进行并行化而缺乏性能改进感到惊讶.我的期望不公平吗?对于以下过程,使用parallelMap的性能要比串行调整慢得多:

Edit

I am still surprised by my lack of performance improvement attempting to parallelize at the tuning level. Are my expectations unfair? I am getting substantially slower performance with parallelMap than tuning in serial for the below process:

numeric_ps = makeParamSet(
  makeNumericParam("C", lower = 0.5, upper = 2.0),
  makeNumericParam("sigma", lower = 0.5, upper = 2.0)
)
ctrl = makeTuneControlRandom(maxit=1024L)
rdesc = makeResampleDesc("CV", iters = 3L)

#In serial
start.time.serial <- Sys.time()
res.serial = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
                 par.set = numeric_ps, control = ctrl)
stop.time.serial <- Sys.time()
stop.time.serial - start.time.serial

#In parallel with 2 CPUs
start.time.parallel.2 <- Sys.time()
parallelStart(mode="multicore", cpu=2, level="mlr.tuneParams")
res.parallel.2 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
                 par.set = numeric_ps, control = ctrl)
parallelStop()
stop.time.parallel.2 <- Sys.time()
stop.time.parallel.2 - start.time.parallel.2

#In parallel with 16 CPUs
start.time.parallel.16 <- Sys.time()
parallelStart(mode="multicore", cpu=16, level="mlr.tuneParams")
res.parallel.16 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
                          par.set = numeric_ps, control = ctrl)
parallelStop()
stop.time.parallel.16 <- Sys.time()
stop.time.parallel.16 - start.time.parallel.16 

我的控制台输出是(省略了调整详细信息):

My console output is (tuning details omitted):

> stop.time.serial - start.time.serial
Time difference of 33.0646 secs

> stop.time.parallel - start.time.parallel
Time difference of 2.49616 mins

> stop.time.parallel.16 - start.time.parallel.16
Time difference of 2.533662 mins

我希望并行处理的速度更快.对于这个例子,这是不合理的吗?如果是这样,我什么时候可以期望同时提高性能?

I would have expected things to be faster in parallel. Is that unreasonable for this example? If so, when should I expect performance improvements in parallel?

在终端上,我似乎确实使用了2个(和16个)线程/进程(如果我的术语不正确,则表示歉意).

Looking at the terminal, I do seem to be using 2 (and 16) threads/processes (apologies if my terminology is incorrect).

非常感谢您的进一步投入.

Thanks so much for any further input.

推荐答案

这个问题更多的是关于猜测设置中的错误,而不是实际提供真实"的答案.也许您还可以更改标题,因为您没有得到意外的结果".

This question is more about guessing whats wrong in your setup than actually providing a "real" answer. Maybe you could also change the title as you did not get "unexpected results".

一些要点:

  • nthread = 1已经是mlr
  • xgboost的默认设置
  • multicore是UNIX系统上的首选模式
  • 如果本地计算机比服务器快,那么您的计算将非常快地完成,并且两者之间的CPU频率明显不同,或者您应该考虑并行化除mlr.tuneParams之外的另一个级别(请参阅
  • nthread = 1 is already the default for xgboost in mlr
  • multicore is the preferred mode on UNIX systems
  • If your local machine is faster than your server, than either your calculations finish very quickly and the CPU freq between both is substantially different or you should think about parallelizing another level than mlr.tuneParams (see here for more information)

我的机器上的一切都很好.看起来像是您身边的本地问题.

Everythings fine on my machine. Looks like a local problem on your side.

library(mlr)
#> Loading required package: ParamHelpers
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
library(parallelMap)

numeric_ps = makeParamSet(
  makeNumericParam("C", lower = 0.5, upper = 2.0),
  makeNumericParam("sigma", lower = 0.5, upper = 2.0)
)
ctrl = makeTuneControlRandom(maxit=1024L)
rdesc = makeResampleDesc("CV", iters = 3L)

#In serial
start.time.serial <- Sys.time()
res.serial = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
  par.set = numeric_ps, control = ctrl)
#> [Tune] Started tuning learner classif.ksvm for parameter set:
#>          Type len Def   Constr Req Tunable Trafo
#> C     numeric   -   - 0.5 to 2   -    TRUE     -
#> sigma numeric   -   - 0.5 to 2   -    TRUE     -
#> With control class: TuneControlRandom
#> Imputation value: 1
stop.time.serial <- Sys.time()
stop.time.serial - start.time.serial
#> Time difference of 31.28781 secs


#In parallel with 2 CPUs
start.time.parallel.2 <- Sys.time()
parallelStart(mode="multicore", cpu=2, level="mlr.tuneParams")
#> Starting parallelization in mode=multicore with cpus=2.
res.parallel.2 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
  par.set = numeric_ps, control = ctrl)
#> [Tune] Started tuning learner classif.ksvm for parameter set:
#>          Type len Def   Constr Req Tunable Trafo
#> C     numeric   -   - 0.5 to 2   -    TRUE     -
#> sigma numeric   -   - 0.5 to 2   -    TRUE     -
#> With control class: TuneControlRandom
#> Imputation value: 1
#> Mapping in parallel: mode = multicore; level = mlr.tuneParams; cpus = 2; elements = 1024.
#> [Tune] Result: C=1.12; sigma=0.647 : mmce.test.mean=0.0466667
parallelStop()
#> Stopped parallelization. All cleaned up.
stop.time.parallel.2 <- Sys.time()
stop.time.parallel.2 - start.time.parallel.2
#> Time difference of 16.13145 secs


#In parallel with 4 CPUs
start.time.parallel.16 <- Sys.time()
parallelStart(mode="multicore", cpu=4, level="mlr.tuneParams")
#> Starting parallelization in mode=multicore with cpus=4.
res.parallel.16 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
  par.set = numeric_ps, control = ctrl)
#> [Tune] Started tuning learner classif.ksvm for parameter set:
#>          Type len Def   Constr Req Tunable Trafo
#> C     numeric   -   - 0.5 to 2   -    TRUE     -
#> sigma numeric   -   - 0.5 to 2   -    TRUE     -
#> With control class: TuneControlRandom
#> Imputation value: 1
#> Mapping in parallel: mode = multicore; level = mlr.tuneParams; cpus = 4; elements = 1024.
#> [Tune] Result: C=0.564; sigma=0.5 : mmce.test.mean=0.0333333
parallelStop()
#> Stopped parallelization. All cleaned up.
stop.time.parallel.16 <- Sys.time()
stop.time.parallel.16 - start.time.parallel.16 
#> Time difference of 10.14408 secs

reprex软件包(v0.3.0)于2019-06-14创建 sup>

Created on 2019-06-14 by the reprex package (v0.3.0)

这篇关于R:如何在Linux服务器上使用parallelMap(带有mlr,xgboost)?与Windows相比,性能出乎意料的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆