深度学习-有关咖啡的一些幼稚问题 [英] deep learning - a number of naive questions about caffe

查看:99
本文介绍了深度学习-有关咖啡的一些幼稚问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图理解caffe的基础知识,尤其是与python一起使用.

我的理解是,模型定义(例如给定的神经网络架构)必须包含在'.prototxt'文件中.

当您使用'.prototxt'在数据上训练模型时,会将权重/模型参数保存到'.caffemodel'文件

此外,用于训练的'.prototxt'文件(包括学习率和正则化参数)与用于测试/部署的文件(其中不包括它们)之间是有区别的.

问题:

  1. 是否正确,'.prototxt'是培训的基础,并且 '.caffemodel'是训练的结果(权重),使用 '.prototxt'关于训练数据?
  2. 是否正确,有一个用于训练的'.prototxt'和一个用于训练的 测试,并且只有微小的差异(学习率 和训练的正则化因素),但是 架构(假设您使用神经网络)是否相同?

为这些基本问题和可能的某些非常错误的假设表示歉意,我正在做一些在线研究,上面的内容总结了我迄今为止的理解.

解决方案

让我们看一下BVLC/caffe提供的示例之一: train_val.prototxt :此文件描述培训阶段的网络架构.

  • depoly.prototxt :此文件描述测试时间(部署")的网络体系结构.
  • solver.prototxt :此文件是很小,包含用于训练的元参数".例如,学习率政策层或"Data"层.另一方面,deploy通常不预先知道它将获得什么输入,它仅包含一条语句:

    input: "data"
    input_shape {
      dim: 10
      dim: 3
      dim: 227
      dim: 227
    }
    

    声明网络需要什么输入,其尺寸应该是什么.
    或者,可以放置
    "Input" 层:

    layer {
      name: "input"
      type: "Input"
      top: "data"
      input_param { shape { dim: 10 dim: 3 dim: 227 dim: 227 } }
    }
    

  • 输入标签:在训练过程中,我们向网络提供地面真相"预期输出,而在deploy期间,此信息显然不可用.
  • 亏损层:在训练期间必须定义一个亏损层.该层告诉求解器在每次迭代时应朝哪个方向调整参数.该损失将网络的当前预测与预期的地面真相"进行比较.损耗的梯度向后传播到网络的其余部分,这就是驱动学习过程的原因.在deploy期间,没有损失,也没有向后传播.
  • 在caffe中,提供一个train_val.prototxt来描述网络,火车/val数据集和损失.另外,您提供一个solver.prototxt描述用于训练的元参数.训练过程的输出是一个.caffemodel二进制文件,其中包含训练后的网络参数.
    训练好网络之后,您可以将deploy.prototxt.caffemodel参数一起使用,以预测新输入和看不见的输入的输出.

    I am trying to understand the basics of caffe, in particular to use with python.

    My understanding is that the model definition (say a given neural net architecture) must be included in the '.prototxt' file.

    And that when you train the model on data using the '.prototxt', you save the weights/model parameters to a '.caffemodel' file

    Also, there is a difference between the '.prototxt' file used for training (which includes learning rate and regularization parameters) and the one used for testing/deployment, which does not include them.

    Questions:

    1. is it correct that the '.prototxt' is the basis for training and that the '.caffemodel' is the result of training (weights), using the '.prototxt' on the training data?
    2. is it correct that there is a '.prototxt' for training and one for testing, and that there are only slight differences (learning rate and regularization factors on training), but that the nn architecture (assuming you use neural nets) is the same?

    Apologies for such basic questions and possibly some very incorrect assumptions, I am doing some online research and the lines above summarize my understanding to date.

    解决方案

    Let's take a look at one of the examples provided with BVLC/caffe: bvlc_reference_caffenet.
    You'll notice that in fact there are 3 '.prototxt' files:

    The net architecture represented by train_val.prototxt and deploy.prototxt should be mostly similar. There are few main difference between the two:

    • Input data: during training one usually use a predefined set of inputs for training/validation. Therefore, train_val usually contains an explicit input layer, e.g., "HDF5Data" layer or a "Data" layer. On the other hand, deploy usually does not know in advance what inputs it will get, it only contains a statement:

      input: "data"
      input_shape {
        dim: 10
        dim: 3
        dim: 227
        dim: 227
      }
      

      that declares what input the net expects and what should be its dimensions.
      Alternatively, One can put an "Input" layer:

      layer {
        name: "input"
        type: "Input"
        top: "data"
        input_param { shape { dim: 10 dim: 3 dim: 227 dim: 227 } }
      }
      

    • Input labels: during training we supply the net with the "ground truth" expected outputs, this information is obviously not available during deploy.
    • Loss layers: during training one must define a loss layer. This layer tells the solver in what direction it should tune the parameters at each iteration. This loss compares the net's current prediction to the expected "ground truth". The gradient of the loss is back-propagated to the rest of the net and this is what drives the learning process. During deploy there is no loss and no back-propagation.

    In caffe, you supply a train_val.prototxt describing the net, the train/val datasets and the loss. In addition, you supply a solver.prototxt describing the meta parameters for training. The output of the training process is a .caffemodel binary file containing the trained parameters of the net.
    Once the net was trained, you can use the deploy.prototxt with the .caffemodel parameters to predict outputs for new and unseen inputs.

    这篇关于深度学习-有关咖啡的一些幼稚问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆