多次运行H2O深度学习来保持结果一致 [英] Consisten results with Multiple runs of h2o deeplearning
问题描述
对于h2o的深度学习功能中的某些参数组合,每次运行它都会得到不同的结果.
For a certain combination of parameters in the deeplearning function of h2o, I get different results each time I run it.
args <- list(list(hidden = c(200,200,200),
loss = "CrossEntropy",
hidden_dropout_ratio = c(0.1, 0.1,0.1),
activation = "RectifierWithDropout",
epochs = EPOCHS))
run <- function(extra_params) {
model <- do.call(h2o.deeplearning,
modifyList(list(x = columns, y = c("Response"),
validation_frame = validation, distribution = "multinomial",
l1 = 1e-5,balance_classes = TRUE,
training_frame = training), extra_params))
}
model <- lapply(args, run)
每次运行此命令时,为了获得一致的模型结果,我需要做什么?
What would I need to do in order to get consistent results for the model each time I run this?
推荐答案
如果H2O在多个内核上运行,则无法进行深度学习.每次训练深度学习模型时,结果和性能指标可能会与您看到的略有不同. H2O中的实现使用一种称为"Hogwild!"的技术.这样可以提高训练速度,但会牺牲多个核心的可重复性.
Deeplearning with H2O will not be reproducible if it is run on more than a single core. The results and performance metrics may vary slightly from what you see each time you train the deep learning model. The implementation in H2O uses a technique called "Hogwild!" which increases the speed of training at the cost of reproducibility on multiple cores.
因此,如果要获得可重复的结果,则需要限制H2O在单个内核上运行,并确保在h2o.deeplearning
调用中使用seed
.
So if you want reproducible results you will need to restrict H2O to run on a single core and make sure to use a seed
in the h2o.deeplearning
call.
根据Darren Cook的评论进行
我忘记包含需要与seed
结合设置的reproducible = TRUE
参数,以使其真正可重现.请注意,这会使运行速度变慢.并且不建议对大型数据集执行此操作.
Edit based on comment by Darren Cook:
I forgot to include the reproducible = TRUE
parameter that needs to be set in combination with the seed
to make it truly reproducible. Note that this will make it a lot slower to run. And is is not advisable to do this with a large dataset.
有关
这篇关于多次运行H2O深度学习来保持结果一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!