Caffe CNN:卷积层内过滤器的多样性 [英] Caffe CNN: diversity of filters within a conv layer

查看:161
本文介绍了Caffe CNN:卷积层内过滤器的多样性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于CNN中的conv层,我有以下理论问题.想象一下一个带有6个滤镜的conv层(图中的conv1层及其6个滤镜).

I have the following theoretical questions regarding the conv layer in a CNN. Imagine a conv layer with 6 filters (conv1 layer and its 6 filters in the figure).

1)是什么保证了conv层中学习的过滤器的多样性? (我是说,学习(优化过程)如何确保不学习相同(相似)的过滤器?

1) what guarantees the diversity of learned filters within a conv layer? (I mean, how the learning (optimization process) makes sure that it does not learned the same (similar) filters?

2)转换层内过滤器的多样性是一件好事吗?有任何研究吗?

2) diversity of filters within a conv layer is a good thing or not? Is there any research on this?

3)在学习(优化过程)期间,同一层的过滤器之间是否存在任何交互作用?如果是,怎么办?

3) during the learning (optimization process), is there any interaction between the filters of the same layer? if yes, how?

推荐答案

1.

假设您正在使用SGD(或类似的backprop变体)训练网络,则权重初始化为 random 的事实会鼓励它们多样化,因为每个不同的随机滤波器的梯度wrt损失通常是不同的,梯度会导致权重在不同方向上拉",从而产生不同的过滤器.

1.

Assuming you are training your net with SGD (or a similar backprop variant) the fact that the weights are initialized at random encourage them to be diverse, since the gradient w.r.t loss for each different random filter is usually different the gradient will "pull" the weights in different directions resulting with diverse filters.

但是,没有什么可以保证多样性.实际上,有时过滤器会相互关联(请参阅 GrOWL 及其引用)或零.

However, there is nothing that guarantees diversity. In fact, sometimes filters become tied to each other (see GrOWL and references therein) or drop to zero.

当然,您希望过滤器尽可能多样化,以捕获数据的各种不同方面.假设您的第一层只具有响应垂直边缘的滤镜,您的网络将如何处理包含水平边缘(或其他类型的纹理)的类?
此外,如果您有几个相同的过滤器,为什么要计算两次相同的响应?这是非常低效的.

Of course you want your filters to be as diverse as possible to capture all sorts of different aspects of your data. Suppose your first layer will only have filters responding to vertical edges, how is your net going to cope with classes containing horizontal edges (or other types of textures)?
Moreover, if you have several filters that are the same, why computing the same responses twice? This is highly inefficient.

使用开箱即用"的优化器,每层的学习滤波器彼此独立(梯度线性).但是,人们可以使用更复杂的损失函数/正则化方法来使其依赖.
例如,使用组套索正则化可以强制某些过滤器为零,而其他过滤器则保持信息性.

Using "out-of-the-box" optimizers, the learned filters of each layer are independent of each other (linearity of gradient). However, one can use more sophisticated loss functions/regularization methods to make them dependent.
For instance, using group Lasso regularization, can force some of the filters to zero while keeping the others informative.

这篇关于Caffe CNN:卷积层内过滤器的多样性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆