转移学习为何要删除最后一个隐藏层? [英] Transfer learning why remove last hidden layer?

查看:160
本文介绍了转移学习为何要删除最后一个隐藏层?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在阅读有关迁移学习的博客时,通常会说-删除最后一层,或删除最后两层.也就是说,删除输出层和最后一个隐藏层.

Often when reading blogs about transfer learning it says - remove the last layer, or remove the last two layers. That is, remove output layer and last hidden layer.

因此,如果转移学习意味着还更改成本函数,例如从交叉熵到均方误差,我知道您需要将最后一个输出层从1001的softmax值层更改为Dense(1)层,该层输出浮点数,但是:

So if the transfer learning implies changing the cost function also, e.g. from cross-entropy to mean squared errro, I understand that you need to change the last output layer from 1001 layer of softmax values to a Dense(1) layer which outputs a float, but:

  1. 为什么还要更改最后一个隐藏层?
  2. 如果使用Keras以及具有imagenet权重的预定义CNN模型之一,那么将初始化最后两个新层的权重是多少?他初始化还是0初始化?

推荐答案

为什么要删除图层?

如果您仅尝试更改成本函数,那么您并没有按照大多数人的定义进行迁移学习.转移学习主要是关于迁移到新的应用程序域.因此,对于图像,采用狗识别器/探测器并将其转换为鸟类识别器/探测器,而不是狗的年龄/体重猜测者. (或者带上您的1001通用目标检测器,并仅使用它来查看安全摄像机镜头等)

If you're only trying to change the cost function, you're not doing transfer learning by most people's definition. Transfer learning is primarily about moving to a new application domain. So for images, taking a dog identifier/detector and transferring it to be a bird identifier/detector, not a dog age/weight guesser. (Or taking your 1001 general purpose object detector and using it to only look at security camera footage, etc)

大多数文献说,CNN的较低级别正在学习一些像素的大小的低级概念,这是相当通用的.中间层是与眼球或鼻子相对应的对象检测器,顶层是最高层,指定这些中层对象彼此之间的相对位置,并表示最高层特征.最后一个softmax只是在说哪种狗.那些最后的,最高级别的功能可能与新任务无关.

Most literature says that lower levels of the CNN are learning low level concepts the size of a few pixels which are fairly general purpose. The middle layers are object detectors, corresponding to eyeball or nose, and the top layers are highest level, specifying locations of those mid-level objects in relation to each other, and represent highest level features. The last softmax is just saying which species of dog. Those last, highest level features are probably not relevant to the new task.

这是出于以下观察的动机: ConvNet包含更多通用功能(例如边缘检测器或颜色 blob探测器)应该对许多任务有用,但在以后的层中 的ConvNet逐渐变得更加具体于 原始数据集中包含的类.

This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset.

来自: http://cs231n.github.io/transfer-learning/

以下是另外两个解释: https://machinelearningmastery.com/transfer-learning-for-deep-learning/

Here's a couple of other explanations: https://machinelearningmastery.com/transfer-learning-for-deep-learning/

https ://medium.com/nanonets/nanonets-how-to-use-deep-learning-when-you-have-limited-data-f68c0b512cab

新图层应初始化为什么?

在您最初的问题中,您询问他初始化还是0初始化?".再说一次,我认为这更多是工程问题,因为有证据表明某些事情比其他事情做得更好,但是我不知道还存在一个被广泛接受的证据,可以保证一种方法的最佳性能.除非不要将所有内容都初始化为零.这绝对是错误的,因为您可以在我链接的第一篇文章中看到到下面.还请记住,这只是初始化.因此,即使我的知识稍有过时,也应该花一些额外的时间来训练副手彻底失败或破烂的答案.根据您的问题而定,可能是大笔费用还是小笔费用,这将决定您要花多少时间研究这些选件并进行小规模尝试.

In your original question you asked "He initialized or 0 initialized?". Again, I think this is more of an engineering question in that there's evidence that some things work better than others, but I don't know that there's yet a widely accepted proof guaranteeing optimal performance of one over the other. Except don't initialize everything to zero. That's definitely wrong, as you can see in the first post I link to below. Also keep in mind this is just initialization. So even if my knowledge is slightly out of date, all that it should cost you is some extra epochs of training vice outright failure or junk answers. Depending on your problem that may be a large cost or a small cost, which would dictate how much time you'd spend investigating the options and trying some out on a small scale.

http://andyljones.tumblr.com/post /110998971763/xavier初始化的解释

https://stats .stackexchange.com/questions/229885/使用elu-activat时建议的重量初始化策略是什么

这篇关于转移学习为何要删除最后一个隐藏层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆