如何并行化 xgboost 拟合? [英] How to parallelize an xgboost fit?

查看:96
本文介绍了如何并行化 xgboost 拟合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用不同的参数(例如用于参数调整)来拟合许多 xgboost 模型.需要并行运行它们以减少时间.但是,在运行 %dopar% 命令后,我收到以下错误:Error in unserialize(socklist[[n]]) : error reading from connection.

I am trying to fit many xgboost models with different parameters (e.g. for parameter tuning). Running them in parallel is needed to reduce time. However, upon running the %dopar% command I get the following error: Error in unserialize(socklist[[n]]) : error reading from connection.

下面是一个可重现的示例.它与 xgboost 有关,因为任何其他涉及全局变量的计算都在 %dopar% 循环内工作.有人可以指出这种方法的缺失/错误吗?

Below is a reproducible example. It has to do with xgboost, since any other calculation involving global variables works within the %dopar% loop. Could someone point out what is missing/wrong with this approach?

#### Load packages
library(xgboost)
library(parallel)
library(foreach)
library(doParallel)

#### Data Sim
n = 1000
X = cbind(runif(n,10,20), runif(n,0,10))
y = 10 + 2*X[,1] + 3*X[,2] + rnorm(n,0,1)

#### Init XGB
train = xgb.DMatrix(data  = X[-((n-10):n),], label = y[-((n-10):n)])
test  = xgb.DMatrix(data  = X[(n-10):n,],    label = y[(n-10):n]) 
watchlist = list(train = train, test = test)

#### Init parallel & run
numCores = detectCores()
cl = parallel::makeCluster(numCores)
doParallel::registerDoParallel(cl)

clusterEvalQ(cl, {
  library(xgboost)
})

pred = foreach(i = 1:10, .packages = c("xgboost")) %dopar% {
  xgb.train(data = train, watchlist = watchlist, max_depth=i, nrounds = 1000, early_stopping_rounds = 10)$best_score
 # if xgb.train is replaced with anything else, e.g. 1+y, it works
} 

stopCluster(cl) 

推荐答案

如 HenrikB 的评论中所述,xgb.DMatrix 对象不能用于并行化.为了解决这个问题,我们可以在 foreach 中创建对象:

As noted in the comments by HenrikB xgb.DMatrix objects can't be used in parallelization. To get around this we can make the object inside of foreach:

#### Load packages
library(xgboost)
library(parallel)
library(foreach)
library(doParallel)
#> Loading required package: iterators

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')

#### Init parallel & run
numCores = detectCores()
cl = parallel::makeCluster(numCores, setup_strategy = "sequential")
doParallel::registerDoParallel(cl)
  
  
  
  
pred = foreach(i = 1:10, .packages = c("xgboost")) %dopar% {
    # BRING CREATION OF XGB MATRIX INSIDE OF foreach
    dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
    dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
    
    watchlist = list(dtrain = dtrain, dtest = dtest)
    
    param <- list(max_depth = i, eta = 0.01, verbose = 0,
                  objective = "binary:logistic", eval_metric = "auc")
    bst <- xgb.train(param, dtrain, nrounds = 100, watchlist, early_stopping_rounds = 10)
    bst$best_score
    } 

stopCluster(cl) 
pred
#> [[1]]
#> dtest-auc 
#>  0.892138 
#> 
#> [[2]]
#> dtest-auc 
#>  0.987974 
#> 
#> [[3]]
#> dtest-auc 
#>  0.986255 
#> 
#> [[4]]
#> dtest-auc 
#>         1 
#>  ...

基准测试:

由于 xgboost.train 已经被并行化,所以查看线程用于 xgboost 与用于并行运行的线程之间的速度差异可能会很有趣调整轮次.

Benchmarking:

Since xgboost.train is already parellalized, it might be interesting to see the difference in speeds between when threads are used for xgboost vs when used for the parallel running of tuning rounds.

为此,我封装了一个函数并对不同的组合进行了基准测试:

To do this I wrapped in a function and benchmarked the different combinations:


tune_par <- function(xgbthread, doparthread) {
  
  data(agaricus.train, package='xgboost')
  data(agaricus.test, package='xgboost')
  
  #### Init parallel & run
  cl = parallel::makeCluster(doparthread, setup_strategy = "sequential")
  doParallel::registerDoParallel(cl)
  
  clusterEvalQ(cl, {
    data(agaricus.train, package='xgboost')
    data(agaricus.test, package='xgboost')
  })
  
  
  
  pred = foreach(i = 1:10, .packages = c("xgboost")) %dopar% {
    dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
    dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
    
    watchlist = list(dtrain = dtrain, dtest = dtest)
    
    param <- list(max_depth = i, eta = 0.01, verbose = 0, nthread = xgbthread,
                  objective = "binary:logistic", eval_metric = "auc")
    bst <- xgb.train(param, dtrain, nrounds = 100, watchlist, early_stopping_rounds = 10)
    bst$best_score
  } 
  
  stopCluster(cl) 
  
  pred
  
}

在我的测试中,为 xgboost 使用更多线程时评估速度更快,而为并行运行的调整轮使用更少线程.什么最有效可能取决于系统规格和数据量.

In my testing evaluation was faster when using more threads for xgboost and less for the parallel running of tuning rounds. What works best probably depends on system specs and the amount of data.

# 16 logical cores split between xgb threads and threads in dopar cluster:
microbenchmark::microbenchmark(
  xgb16par1 = tune_par(xgbthread = 16, doparthread = 1),
  xgb8par2 = tune_par(xgbthread = 8, doparthread = 2),
  xgb4par4 = tune_par(xgbthread = 4,doparthread = 4),
  xgb2par8 = tune_par(xgbthread = 2, doparthread = 8),
  xgb1par16 = tune_par(xgbthread = 1,doparthread = 16),
  times = 5
)
#> Unit: seconds
#>       expr      min       lq     mean   median       uq      max neval  cld
#>  xgb16par1 2.295529 2.431110 2.500170 2.519277 2.527914 2.727021     5 a   
#>   xgb8par2 2.301189 2.308377 2.407767 2.363422 2.465446 2.600402     5 a   
#>   xgb4par4 2.632711 2.778304 2.875816 2.825471 2.849003 3.293593     5  b  
#>   xgb2par8 4.508485 4.682284 4.752776 4.810461 4.822566 4.940085     5   c 
#>  xgb1par16 8.493378 8.550609 8.679931 8.768008 8.779718 8.807943     5    d

这篇关于如何并行化 xgboost 拟合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆