R h2o服务器CURL错误,可重复 [英] R h2o server CURL error, kind of repeatable

查看:61
本文介绍了R h2o服务器CURL错误,可重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

起初,我认为这是一个随机的问题,但是重新运行脚本又会再次发生.

At first I thought it was a random issue, but re-running the script it happens again.

Error in .h2o.doSafeREST(h2oRestApiVersion = h2oRestApiVersion, urlSuffix = urlSuffix,  : 
Unexpected CURL error: Recv failure: Connection reset by peer

我正在使用梯度提升机模型在中等大小的数据集(大约40000 x 30)上进行网格搜索.网格中最大的树是1000.这通常是在运行几个小时后发生的.我将 max_mem_size 设置为30Gb.

I'm doing a grid search on a medium-size dataset (roughly 40000 x 30) with a Gradient Boosting Machine model. The largest tree in the grid is 1000. This usually happens after running for a couple of hours. I set max_mem_size to 30Gb.

for ( k in 1:nrow(par.grid)) {
    hg = h2o.gbm(training_frame = Xtr.hf, 
                 validation_frame = Xt.hf,
                 distribution="huber",
                 huber_alpha = HuberAlpha,
                 x=2:ncol(Xtr.hf),        
                 y=1,                     
                 ntrees = par.grid[k,"ntree"],
                 max_depth = depth,
                 learn_rate = par.grid[k,"shrink"],
                 min_rows = par.grid[k,"min_leaf"],
                 sample_rate = samp_rate,
                 col_sample_rate = c_samp_rate,
                 nfolds = 5,
                 model_id = p(iname, "_gbm_CV")
                 )
    cv_result[k,1] = h2o.mse(hg, train=TRUE)
    cv_result[k,2] = h2o.mse(hg, valid=TRUE)
  }

推荐答案

尝试在最内层的循环中添加 gc().更好的办法是显式使用 h2o.rm().

Try adding gc() in your innermost loop. Even better would be to explicitly use h2o.rm().

所以,它会变成这样:

for ( k in 1:nrow(par.grid)) {
  hg = h2o.gbm(...stuff...,
             model_id = p(iname, "_gbm_CV")
             )
  cv_result[k,1] = h2o.mse(hg, train=TRUE)
  cv_result[k,2] = h2o.mse(hg, valid=TRUE)
  h2o.rm(hg);rm(hg);gc()
}

从理论上讲这并不重要,但是如果R坚持引用,那么H2O也将如此.

Theoretically this shouldn't matter, but if R holds on to the reference, then H2O will too.

如果您认为可能要进一步研究任何模型,并且有足够的本地磁盘空间,则可以在 h2o.mse()<之前先做 h2o.saveModel()/code>调用.(当然,您需要指定一个文件名,以某种方式总结所有参数...)

If you think you might want to investigate any models further, and you have plenty of local disk space, you could do h2o.saveModel() before your h2o.mse() calls. (You'll need to specify a filename that somehow summarizes all your parameters, of course...)

基于注释的更新:如果不需要保留任何模型或数据,则使用 h2o.removeAll()是另一种快速回收所有内存的方法.(这种方法也值得考虑,如果您需要保存的任何数据或模型可以快速,轻松地重新加载.)

UPDATE based on comment: If you do not need to keep any models or data, then using h2o.removeAll() is another way to rapidly reclaim all the memory. (This approach is also worth considering if any data or models you do need preserved are quick and easy to re-load.)

这篇关于R h2o服务器CURL错误,可重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆