如何训练多输出深度学习模型? [英] How is a multiple-outputs deep learning model trained?

查看:746
本文介绍了如何训练多输出深度学习模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想我不理解多输出网络.

I think I do not understand the multiple-output networks.

通过我了解了实现的方式并成功地训练了这样的模型,我不了解如何训练多输出深度学习网络.我的意思是,培训期间网络内部发生了什么?

Althrough i understand how the implementation is made and i succesfully trained one model like this, i don't understand how a multiple-outputs deep learning network is trained. I mean, what is happening inside the network during training?

keras功能性API指南中的该网络为例:

您可以看到两个输出(aux_output和main_output).反向传播如何运作?

You can see the two outputs (aux_output and main_output). How is the backpropagation working?

我的直觉是该模型进行两次反向传播,每个输出一次. 然后,每个反向传播都会更新退出之前的图层的权重. 但似乎不正确::来自

My intuition was that the model does two backpropagations, one for each output. Each backpropagation then updates the weight of the layers preceding the exit. But it appears that's not true: from here (SO), i got the information that there is only one backpropagation despite the multiple outputs; the used loss is weighted according to the outputs.

但是,我仍然不知道如何训练网络及其辅助分支.由于未直接连接到主输出,辅助分支权重如何更新?网络中位于辅助分支的根与主输出之间的部分是否受到损失权重的影响?还是权重仅影响连接到辅助输出的网络部分?

But still, i don't get how the network and its auxiliary branch are trained; how are the auxiliary branch weights updated as it is not connected directly to the main output? Is the part of the network which is between the root of the auxiliary branch and the main output concerned by the the weighting of the loss? Or the weighting influences only the part of the network that is connected to the auxiliary output?

此外,我正在寻找有关此主题的好文章.我已经阅读过GoogLeNet/Inception文章( v1

Also, i'm looking for good articles about this subject. I already read GoogLeNet / Inception articles (v1,v2-v3) as this network is using auxiliary branches.

推荐答案

Keras计算基于图形,并且仅使用一个优化程序.

Keras calculations are graph based and use only one optimizer.

优化器也是图形的一部分,在其计算中,它获得了整个权重组的梯度. (不是两组渐变,每个输出一组,而是整个模型的一组渐变).

The optimizer is also a part of the graph, and in its calculations it gets the gradients of the whole group of weights. (Not two groups of gradients, one for each output, but one group of gradients for the entire model).

从数学上讲,这并不是很复杂,您有一个由以下组成的最终损失函数:

Mathematically, it's not really complicated, you have a final loss function made of:

loss = (main_weight * main_loss) + (aux_weight * aux_loss) #you choose the weights in model.compile

全部由您定义.加上一系列其他可能的权重(样本权重,类权重,正则化条件等)

All defined by you. Plus a series of other possible weights (sample weights, class weights, regularizer terms, etc.)

位置:

  • main_lossfunction_of(main_true_output_data, main_model_output)
  • aux_lossfunction_of(aux_true_output_data, aux_model_output)
  • main_loss is a function_of(main_true_output_data, main_model_output)
  • aux_loss is a function_of(aux_true_output_data, aux_model_output)

所有权重的梯度仅为∂(loss)/∂(weight_i).

And the gradients are just ∂(loss)/∂(weight_i) for all weights.

优化器一旦具有渐变,它就会执行一次优化步骤.

Once the optimizer has the gradients, it performs its optimization step once.

问题:

由于辅助分支权重没有直接连接到主输出,该如何更新?

how are the auxiliary branch weights updated as it is not connected directly to the main output?

  • 您有两个输出数据集. main_output的一个数据集,而aux_output的另一个数据集.您必须将它们传递给model.fit(inputs, [main_y, aux_y], ...)
  • 中的fit
  • 您还有两个损失函数,每个损失函数一个,其中main_lossmain_ymain_out;和aux_loss takex aux_yaux_out.
  • 两个损失相加:loss = (main_weight * main_loss) + (aux_weight * aux_loss)
  • 一次为函数loss计算梯度,并且此函数连接到整个模型.
    • aux项将影响反向传播中的lstm_1embedding_1.
    • 因此,
    • 因此,在下一个正向传递(权重更新之后)中,它将最终影响主分支. (是好是坏取决于aux输出是否有用)
      • You have two output datasets. One dataset for main_output and another dataset for aux_output. You must pass them to fit in model.fit(inputs, [main_y, aux_y], ...)
      • You also have two loss functions, one for each, where main_loss takes main_y and main_out; and aux_loss takex aux_y and aux_out.
      • The two losses are summed: loss = (main_weight * main_loss) + (aux_weight * aux_loss)
      • The gradients are calculated for the function loss once, and this function connects to the entire model.
        • The aux term will affect lstm_1 and embedding_1 in backpropagation.
        • Consequently, in the next forward pass (after weights are updated) it will end up influencing the main branch. (If it will be better or worse only depends on whether the aux output is useful or not)
        • 损失权重是否是辅助分支的根与主输出之间的网络部分?还是权重仅影响连接到辅助输出的网络部分?

          Is the part of the network which is between the root of the auxiliary branch and the main output concerned by the the weighting of the loss? Or the weighting influences only the part of the network that is connected to the auxiliary output?

          权重是简单的数学.您将在compile中定义它们:

          The weights are plain mathematics. You will define them in compile:

          model.compile(optimizer=one_optimizer, 
          
                        #you choose each loss   
                        loss={'main_output':main_loss, 'aux_output':aux_loss},
          
                        #you choose each weight
                        loss_weights={'main_output': main_weight, 'aux_output': aux_weight}, 
          
                        metrics = ...)
          

          并且损失函数将在loss = (weight1 * loss1) + (weight2 * loss2)中使用它们.
          其余的是每个重量∂(loss)/∂(weight_i)的数学计算.

          And the loss function will use them in loss = (weight1 * loss1) + (weight2 * loss2).
          The rest is the mathematical calculation of ∂(loss)/∂(weight_i) for each weight.

          这篇关于如何训练多输出深度学习模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆