深度学习-有关咖啡的一些幼稚问题 [英] deep learning - a number of naive questions about caffe
问题描述
我试图理解caffe的基础知识,尤其是与python一起使用.
我的理解是,模型定义(例如给定的神经网络架构)必须包含在'.prototxt'
文件中.
当您使用'.prototxt'
在数据上训练模型时,会将权重/模型参数保存到'.caffemodel'
文件
此外,用于训练的'.prototxt'
文件(包括学习率和正则化参数)与用于测试/部署的文件(其中不包括它们)之间是有区别的.
问题:
- 是否正确,
'.prototxt'
是培训的基础,并且'.caffemodel'
是训练的结果(权重),使用'.prototxt'
关于训练数据? - 是否正确,有一个用于训练的
'.prototxt'
和一个用于训练的 测试,并且只有微小的差异(学习率 和训练的正则化因素),但是 架构(假设您使用神经网络)是否相同?
为这些基本问题和可能的某些非常错误的假设表示歉意,我正在做一些在线研究,上面的内容总结了我迄今为止的理解.
让我们看一下BVLC/caffe提供的示例之一: train_val.prototxt
:此文件描述培训阶段的网络架构.
depoly.prototxt
:此文件描述测试时间(部署")的网络体系结构. solver.prototxt
:此文件是很小,包含用于训练的元参数".例如,学习率政策,层或"Data"
层.另一方面,deploy
通常不预先知道它将获得什么输入,它仅包含一条语句:input: "data"
input_shape {
dim: 10
dim: 3
dim: 227
dim: 227
}
声明网络需要什么输入,其尺寸应该是什么.
或者,可以放置 "Input"
层:
layer {
name: "input"
type: "Input"
top: "data"
input_param { shape { dim: 10 dim: 3 dim: 227 dim: 227 } }
}
deploy
期间,此信息显然不可用. deploy
期间,没有损失,也没有向后传播.在caffe中,提供一个train_val.prototxt
来描述网络,火车/val数据集和损失.另外,您提供一个solver.prototxt
描述用于训练的元参数.训练过程的输出是一个.caffemodel
二进制文件,其中包含训练后的网络参数.
训练好网络之后,您可以将deploy.prototxt
与.caffemodel
参数一起使用,以预测新输入和看不见的输入的输出.
I am trying to understand the basics of caffe, in particular to use with python.
My understanding is that the model definition (say a given neural net architecture) must be included in the '.prototxt'
file.
And that when you train the model on data using the '.prototxt'
, you save the weights/model parameters to a '.caffemodel'
file
Also, there is a difference between the '.prototxt'
file used for training (which includes learning rate and regularization parameters) and the one used for testing/deployment, which does not include them.
Questions:
- is it correct that the
'.prototxt'
is the basis for training and that the'.caffemodel'
is the result of training (weights), using the'.prototxt'
on the training data? - is it correct that there is a
'.prototxt'
for training and one for testing, and that there are only slight differences (learning rate and regularization factors on training), but that the nn architecture (assuming you use neural nets) is the same?
Apologies for such basic questions and possibly some very incorrect assumptions, I am doing some online research and the lines above summarize my understanding to date.
Let's take a look at one of the examples provided with BVLC/caffe: bvlc_reference_caffenet
.
You'll notice that in fact there are 3 '.prototxt'
files:
train_val.prototxt
: this file describe the net architecture for the training phase.depoly.prototxt
: this file describe the net architecture for test time ("deployment").solver.prototxt
: this file is very small and contains "meta parameters" for training. For example, the learning rate policy, regulariztion etc.
The net architecture represented by train_val.prototxt
and deploy.prototxt
should be mostly similar. There are few main difference between the two:
Input data: during training one usually use a predefined set of inputs for training/validation. Therefore,
train_val
usually contains an explicit input layer, e.g.,"HDF5Data"
layer or a"Data"
layer. On the other hand,deploy
usually does not know in advance what inputs it will get, it only contains a statement:input: "data" input_shape { dim: 10 dim: 3 dim: 227 dim: 227 }
that declares what input the net expects and what should be its dimensions.
Alternatively, One can put an"Input"
layer:layer { name: "input" type: "Input" top: "data" input_param { shape { dim: 10 dim: 3 dim: 227 dim: 227 } } }
- Input labels: during training we supply the net with the "ground truth" expected outputs, this information is obviously not available during
deploy
. - Loss layers: during training one must define a loss layer. This layer tells the solver in what direction it should tune the parameters at each iteration. This loss compares the net's current prediction to the expected "ground truth". The gradient of the loss is back-propagated to the rest of the net and this is what drives the learning process. During
deploy
there is no loss and no back-propagation.
In caffe, you supply a train_val.prototxt
describing the net, the train/val datasets and the loss. In addition, you supply a solver.prototxt
describing the meta parameters for training. The output of the training process is a .caffemodel
binary file containing the trained parameters of the net.
Once the net was trained, you can use the deploy.prototxt
with the .caffemodel
parameters to predict outputs for new and unseen inputs.
这篇关于深度学习-有关咖啡的一些幼稚问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!