多次运行H2O深度学习来保持结果一致 [英] Consisten results with Multiple runs of h2o deeplearning

查看:195
本文介绍了多次运行H2O深度学习来保持结果一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于h2o的深度学习功能中的某些参数组合,每次运行它都会得到不同的结果.

For a certain combination of parameters in the deeplearning function of h2o, I get different results each time I run it.

args <- list(list(hidden = c(200,200,200), 
                  loss = "CrossEntropy",  
                  hidden_dropout_ratio = c(0.1, 0.1,0.1), 
                  activation = "RectifierWithDropout",  
                  epochs = EPOCHS))

run   <- function(extra_params) {
  model <- do.call(h2o.deeplearning, 
                   modifyList(list(x = columns, y = c("Response"),  
                   validation_frame = validation, distribution = "multinomial",
                   l1 = 1e-5,balance_classes = TRUE, 
                   training_frame = training), extra_params))
}

model <- lapply(args, run) 

每次运行此命令时,为了获得一致的模型结果,我需要做什么?

What would I need to do in order to get consistent results for the model each time I run this?

推荐答案

如果H2O在多个内核上运行,则无法进行深度学习.每次训练深度学习模型时,结果和性能指标可能会与您看到的略有不同. H2O中的实现使用一种称为"Hogwild!"的技术.这样可以提高训练速度,但会牺牲多个核心的可重复性.

Deeplearning with H2O will not be reproducible if it is run on more than a single core. The results and performance metrics may vary slightly from what you see each time you train the deep learning model. The implementation in H2O uses a technique called "Hogwild!" which increases the speed of training at the cost of reproducibility on multiple cores.

因此,如果要获得可重复的结果,则需要限制H2O在单个内核上运行,并确保在h2o.deeplearning调用中使用seed.

So if you want reproducible results you will need to restrict H2O to run on a single core and make sure to use a seed in the h2o.deeplearning call.

根据Darren Cook的评论进行 我忘记包含需要与seed结合设置的reproducible = TRUE参数,以使其真正可重现.请注意,这会使运行速度变慢.并且不建议对大型数据集执行此操作.

Edit based on comment by Darren Cook: I forgot to include the reproducible = TRUE parameter that needs to be set in combination with the seed to make it truly reproducible. Note that this will make it a lot slower to run. And is is not advisable to do this with a large dataset.

有关霍格野生!"

这篇关于多次运行H2O深度学习来保持结果一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆