来自Caffe | solver.prototxt值设置策略 [英] Caffe | solver.prototxt values setting strategy
问题描述
在来自Caffe,我想实现一个完全卷积网络的语义分割.我想知道有没有设置你的
On Caffe, I am trying to implement a Fully Convolution Network for semantic segmentation. I was wondering is there a specific strategy to set up your 'solver.prototxt'
values for the following hyper-parameters:
- test_iter
- test_interval
- iter_size
- max_iter
这是否取决于您训练集所拥有的图像数量?如果是这样,怎么办?
Does it depend on the number of images you have for your training set? If so, how?
推荐答案
为了以有意义的方式来设置这些值,则需要对您的数据的信息的几个位:
In order to set these values in a meaningful manner, you need to have a few more bits of information regarding your data:
1.训练集大小,您拥有的训练样例总数,我们称此数量为T
.
的 2.培训批次大小:在单个批次中一起处理的培训示例的数量,通常由'train_val.prototxt'
中的输入数据层设置.例如,在此文件火车批量大小被设置为256.让我们分别表示该数量由
1. Training set size the total number of training examples you have, let's call this quantity T
.
2. Training batch size the number of training examples processed together in a single batch, this is usually set by the input data layer in the 'train_val.prototxt'
. For example, in this file the train batch size is set to 256. Let's denote this quantity by tb
.
3. Validation set size the total number of examples you set aside for validating your model, let's denote this by V
.
4. Validation batch size value set in batch_size
for the TEST phase. In this example it is set to 50. Let's call this vb
.
现在,在训练中,你希望得到您的净值表现每一个的未偏置估计过一段时间.要做到这一点,你对
Now, during training, you would like to get an un-biased estimate of the performance of your net every once in a while. To do so you run your net on the validation set for test_iter
iterations. To cover the entire validation set you need to have test_iter = V/vb
.
How often would you like to get this estimation? It's really up to you. If you have a very large validation set and a slow net, validating too often will make the training process too long. On the other hand, not validating often enough may prevent you from noting if and when your training process failed to converge. test_interval
determines how often you validate: usually for large nets you set test_interval
in the order of 5K, for smaller and faster nets you may choose lower values. Again, all up to you.
在为了覆盖整个训练集(完成一个历元")需要运行
In order to cover the entire training set (completing an "epoch") you need to run T/tb
iterations. Usually one trains for several epochs, thus max_iter=#epochs*T/tb
.
关于
Regarding iter_size
: this allows to average gradients over several training mini batches, see this thread fro more information.
这篇关于来自Caffe | solver.prototxt值设置策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!