使用贝叶斯优化的深度学习结构超参数优化 [英] Hyperparameter optimization for Deep Learning Structures using Bayesian Optimization

查看:1067
本文介绍了使用贝叶斯优化的深度学习结构超参数优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经为原始信号分类任务构建了CLDNN(卷积,LSTM,深度神经网络)结构.

I have constructed a CLDNN (Convolutional, LSTM, Deep Neural Network) structure for raw signal classification task.

每个训练时期都运行约90秒,并且超参数似乎很难优化.

Each training epoch runs for about 90 seconds and the hyperparameters seems to be very difficult to optimize.

我已经研究了各种优化超参数的方法(例如随机搜索或网格搜索),并了解了贝叶斯优化.

I have been research various ways to optimize the hyperparameters (e.g. random or grid search) and found out about Bayesian Optimization.

尽管我仍然不完全了解优化算法,但我喜欢它会对我有很大帮助.

Although I am still not fully understanding the optimization algorithm, I feed like it will help me greatly.

我想问一些关于优化任务的问题.

I would like to ask few questions regarding the optimization task.

  1. 如何针对深度网络设置贝叶斯优化?(我们要优化的成本函数是什么?)
  2. 我要优化的功能是什么?是N个纪元后验证集的费用吗?
  3. 留兰香是完成此任务的好起点吗?对此任务还有其他建议吗?

我将不胜感激对此问题的见识.

I would greatly appreciate any insights into this problem.

推荐答案

尽管我仍然没有完全了解优化 算法,我喜欢它会极大地帮助我.

Although I am still not fully understanding the optimization algorithm, I feed like it will help me greatly.

首先,让我简要解释一下这一部分. 贝叶斯优化方法旨在处理多臂匪问题 .在这个问题中,有一个 unknown 函数,我们可以在任何点进行评估,但是每种评估成本(直接罚款或机会成本),目标是使用尽可能少的试验来找到其最大值.可能的.基本上,这是要权衡的:您知道函数的有限点集(其中有些是好的,有些是不好的),因此您可以尝试在当前局部最大值附近进行尝试,以期改善它(开发),或者您可以尝试一个全新的空间区域,该区域可能更好或更糟(探索),或者介于两者之间.

First up, let me briefly explain this part. Bayesian Optimization methods aim to deal with exploration-exploitation trade off in the multi-armed bandit problem. In this problem, there is an unknown function, which we can evaluate in any point, but each evaluation costs (direct penalty or opportunity cost), and the goal is to find its maximum using as few trials as possible. Basically, the trade off is this: you know the function in a finite set of points (of which some are good and some are bad), so you can try an area around the current local maximum, hoping to improve it (exploitation), or you can try a completely new area of space, that can potentially be much better or much worse (exploration), or somewhere in between.

贝叶斯优化方法(例如PI,EI,UCB)使用 Gaussian建立目标函数模型流程(GP),并在每个步骤中根据其GP模型选择最有希望的"点(请注意,有希望的"可以通过不同的特定方法进行不同的定义).

Bayesian Optimization methods (e.g. PI, EI, UCB), build a model of the target function using a Gaussian Process (GP) and at each step choose the most "promising" point based on their GP model (note that "promising" can be defined differently by different particular methods).

这是一个例子:

真正的功能是在[-10, 10]间隔内的f(x) = x * sin(x)(黑色曲线).红点代表每个试验,红色曲线是GP的平均值,蓝色曲线是平均值的正负1个标准偏差. 如您所见,GP模型并非在所有地方都符合真正的功能,但是优化程序很快就确定了-8周围的热点"区域并开始对其进行开发.

The true function is f(x) = x * sin(x) (black curve) on [-10, 10] interval. Red dots represent each trial, red curve is the GP mean, blue curve is the mean plus or minus one standard deviation. As you can see, the GP model doesn't match the true function everywhere, but the optimizer fairly quickly identified the "hot" area around -8 and started to exploit it.

如何针对深层设置贝叶斯优化 网络?

How do I set up the Bayesian Optimization with regards to a deep network?

在这种情况下,空间由(可能是经过变换的)超参数定义,通常是多维单位超立方体.

In this case, the space is defined by (possibly transformed) hyperparameters, usually a multidimensional unit hypercube.

例如,假设您有三个超参数:学习率α in [0.001, 0.01],正则化器λ in [0.1, 1](均连续)和隐藏层大小N in [50..100](整数).优化空间是一个三维立方体[0, 1]*[0, 1]*[0, 1].通过以下转换,此多维数据集中的每个点(p0, p1, p2)对应于三位一体(α, λ, N):

For example, suppose you have three hyperparameters: a learning rate α in [0.001, 0.01], the regularizer λ in [0.1, 1] (both continuous) and the hidden layer size N in [50..100] (integer). The space for optimization is a 3-dimensional cube [0, 1]*[0, 1]*[0, 1]. Each point (p0, p1, p2) in this cube corresponds to a trinity (α, λ, N) by the following transformation:

p0 -> α = 10**(p0-3)
p1 -> λ = 10**(p1-1)
p2 -> N = int(p2*50 + 50)

我要优化的功能是什么?这是成本吗 N个纪元后设置验证?

What is the function I am trying to optimize? Is it the cost of the validation set after N epochs?

正确,目标功能是神经网络验证准确性.显然,每次评估都是昂贵的,因为它至少需要几个纪元来进行培训.

Correct, the target function is neural network validation accuracy. Clearly, each evaluation is expensive, because it requires at least several epochs for training.

还请注意,目标函数是随机的,即,同一点上的两次评估可能会略有不同,尽管它显然会增加不确定性,但它不会阻止贝叶斯优化.

Also note that the target function is stochastic, i.e. two evaluations on the same point may slightly differ, but it's not a blocker for Bayesian Optimization, though it obviously increases the uncertainty.

留兰香是完成这项任务的好起点吗?任何其他 对这项任务有什么建议?

Is spearmint a good starting point for this task? Any other suggestions for this task?

spearmint 是一个很好的库,您绝对可以使用它.我还可以推荐 hyperopt .

spearmint is a good library, you can definitely work with that. I can also recommend hyperopt.

在我自己的研究中,我最终编写了自己的微型库,基本上有两个原因:我想编写要使用的精确贝叶斯方法(特别是,我找到了在我的案例中融合得比其他任何东西都快).另外还有另一种可以节省多达50%的培训时间的技术称为学习曲线预测(当优化程序确信模型学习速度不如其他区域时,其想法是跳过整个学习周期).我不知道任何实现此功能的库,因此我自己编写了代码,最后得到了回报.如果您有兴趣,代码是GitHub上的 .

In my own research, I ended up writing my own tiny library, basically for two reasons: I wanted to code exact Bayesian method to use (in particular, I found a portfolio strategy of UCB and PI converged faster than anything else, in my case); plus there is another technique that can save up to 50% of training time called learning curve prediction (the idea is to skip full learning cycle when the optimizer is confident the model doesn't learn as fast as in other areas). I'm not aware of any library that implements this, so I coded it myself, and in the end it paid off. If you're interested, the code is on GitHub.

这篇关于使用贝叶斯优化的深度学习结构超参数优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆