深度神经网络跳过连接实现为求和还是串联? [英] Deep neural network skip connection implemented as summation vs concatenation?

查看:88
本文介绍了深度神经网络跳过连接实现为求和还是串联?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在深度神经网络中,我们可以实现跳过连接以提供帮助:

In deep neural network, we can implement the skip connections to help:

  • 解决梯度消失,训练更快的问题

  • Solve problem of vanishing gradient, training faster

网络学习了高级功能的组合

The network learns a combination of low level and high level features

像最大池化一样在下采样期间恢复信息丢失.

Recover info loss during downsampling like max pooling.

https://medium.com /@ mikeliao/deep-layer-aggregation-combining-layers-in-nn-architectures-2744d29cab8

但是,我阅读了一些源代码,一些实现的跳过连接是串联的,一些是求和的.所以我的问题是,这些实现中的每一种都有什么好处?

However, i read some source code, some implemented skip connections as concatenation, some as summation. So my question is what are the benefits of each of these implementations?

推荐答案

基本上,差异取决于最终层受中间特征影响的方式不同.

Basically, the difference relies on the different way in which the final layer is influenced by middle features.

使用逐元素求和的跳过连接的标准体系结构(例如 ResNet )在某种程度上可以看作是一种迭代估算程序(例如,参见这项工作),其中通过网络的各个层来完善功能.这种选择的主要好处是它可以工作并且是一个紧凑的解决方案(它使整个块中的功能部件数量保持不变).

Standard architectures with skip-connection using element-wise summation (e.g. ResNet) can be viewed as an iterative estimation procedure to some extent (see for instance this work), where the features are refined through the various layers of the network. The main benefits of this choice are that it works and is a compact solution (it keeps the number of features fixed across a block).

具有串联跳过连接的架构(例如 DenseNet )层以重新使用中间表示,从而维护更多信息,从而可以带来更好的性能.除了重复使用功能外,另一个后果是隐式的深度监管(如这项工作),从而可以更好地在网络上传播渐变,尤其是对于深层渐变(实际上,它已用于

Architectures with concatenated skip-connections (e.g. DenseNet), allow the subsequent layers to re-use middle representations, maintaining more information which can lead to better performances. Apart from the feature re-use, another consequence is the implicit deep supervision (as in this work) which allow better gradient propagation across the network, especially for deep ones (in fact it has been used for the Inception architecture).

很明显,如果设计不当,则级联功能会导致参数呈指数增长(这部分解释了您所指出的工作中使用的分层聚合),并且根据问题,使用了大量信息可能会导致过拟合.

Obviously, if not properly designed, concatenating features can lead to an exponential growth of the parameters (this explains, in part, the hierarchical aggregation used in the work you pointed out) and, depending on the problem, using a lot of information could lead to overfitting.

这篇关于深度神经网络跳过连接实现为求和还是串联?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆