在群集上运行时,插入符号中发生错误 [英] Error occurring in caret when running on a cluster

查看:158
本文介绍了在群集上运行时,插入符号中发生错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在插入符中运行train函数通过 doRedis 在群集上运行.在大多数情况下,它是有效的,但是在这种性质的最后,我经常会出错:

I am running the train function in caret on a cluster via doRedis. For the most part, it works, but every so often I get errors at the very end of this nature:

error calling combine function:
<simpleError: obj$state$numResults <= obj$state$numValues is not TRUE>

Error in names(resamples) <- gsub("^\\.", "", names(resamples)) : 
  attempt to set an attribute on NULL

当我运行traceback()时,我得到:

5: nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, 
       ppOpts = preProcess, ctrl = trControl, lev = classLevels, 
       ...)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(couple ~ ., training.balanced, method = "nnet", 
       preProcess = "range", tuneGrid = nnetGrid, MaxNWts = 2200)
1: caret::train(couple ~ ., training.balanced, method = "nnet", 
       preProcess = "range", tuneGrid = nnetGrid, MaxNWts = 2200)

这些错误不容易重现(即,有时会发生,但并非始终如此),并且只会在运行结束时发生.群集上的stdout显示所有正在运行和已完成的任务,所以我有点困惑.

These errors are not easily reproducible (i.e. they happen sometimes, but not consistently) and only occur at the end of the run. The stdout on the cluster shows all tasks running and completed, so I am a bit flummoxed.

有人遇到这些错误吗?如果可以的话,了解原因,甚至更好地解决问题?

Has anyone encountered these errors? and if so understand the cause and even better a fix?

推荐答案

我想您已经解决了这个问题,但是在由linux和Windows系统组成的群集上遇到了相同的问题.我在ubuntu 14.04上运行服务器,并在启动服务器服务时注意到有关在Linux内核中启用透明大页面"的警告.我忽略了该信息,并开始进行培训练习,其中大多数机器都被工人用尽了.运行结束时我收到了相同的错误:

I imagine you've already solved this problem, but I ran into the same issue on my cluster consisting of linux and windows systems. I was running the server on ubuntu 14.04 and had noticed the warnings when starting the server service about having 'transparent huge pages' enabled in the linux kernel. I ignored that message and began running training exercises where most of the machines were maxed out with workers. I received the same error at the end of the run:

error calling combine function:
<simpleError: obj$state$numResults <= obj$state$numValues is not TRUE>

经过大量的头部抓挠和无用的修补后,我决定通过遵循以下说明来解决该警告:

After a lot of head scratching and useless tinkering, I decided to address that warning by following the instructions here: http://ubuntuforums.org/showthread.php?t=2255151

基本上,我使用以下命令安装了hugeadm:

Essentially, I installed hugeadm using:

sudo apt-get install hugeadm

然后使用以下方法禁用透明页面:

Then disabled the transparent pages using:

hugeadm --thp-never

请注意,此更改将在计算机重新启动后撤消.

Note that this change will be undone on restart of the computer.

当我重新执行训练过程时,它没有任何错误.

When I re-ran my training process it ran without any errors.

希望有帮助.

干杯, 埃里克

这篇关于在群集上运行时,插入符号中发生错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆