如何检测火炬中拟合不足和消失梯度的来源? [英] How to detect source of under fitting and vanishing gradients in pytorch?

查看:59
本文介绍了如何检测火炬中拟合不足和消失梯度的来源?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何检测火炬中消失梯度的来源?

通过逐渐消失的梯度,我的意思是,即使对于有限的数据集,训练的损失也不会降至某个值以下.

我正在尝试训练一些网络,但是我遇到了以上问题,在该问题中,我什至无法使网络过度适应,但无法理解问题的根源.

我花了很长的时间对此进行谷歌搜索,只找到了防止过度拟合的方法,但是对于拟合不足或消失的梯度一无所知.


我确实发现了什么

Pytorch论坛讨论此处此处一样.

我知道使网络更大或更复杂".是导致过度拟合的一般建议方法(目前是需要的).
我也知道,非常深的网络的梯度会消失.
对我来说,尚不清楚较大的网络会解决该问题,因为它会产生自己的问题,就像我刚才所说的那样,我仍然不知道如何调试它,同时仍然看到大致相同的行为.
将体系结构更改为某些res-net可能会有所帮助,但也没有帮助,因为问题并未明确指出是由网络深度引起的.

Dere Relu可能会导致装配不足,确实,转而使用LeakyRelu会有所帮助,但仍然不够.


一个如何调试Pytorch中安装不足的源,特别是由消失的梯度引起的源?

我希望能够正确地可视化我网络中的渐变,而不是盲目尝试,而不是猜测.
当然,我不是第一个有此要求的人,为此目的创建了工具和方法.

我想阅读有关它们的信息,但不知道要寻找什么.

我现在拥有的特定网络是无关紧要的,因为这是关于方法论的一个普遍问题.

解决方案

您可以将tensorboard与Pytorch一起使用以可视化训练梯度.在训练过程中将梯度添加到张量板直方图中.


例如...

让我们

  • model 是您的pytorch模型
  • model_input 是模型输入的示例
  • run_name 是您的训练课程的字符串标识符
从torch.utils.tensorboard导入

 summary_writer = SummaryWriter(comment = run_name)summary_writer.add_graph(model,model_input,verbose = True)#训练循环对于...中的step_index:#计算损失等对于名称,在model.named_pa​​rameters()中的参数:summary_writer.add_histogram(f'{name} .grad',param.grad,step_index) 

参考文献:

How to detect source of vanishing gradients in pytorch?

By vanishing gradients, I mean then the training loss doesn't go down below some value, even on limited sets of data.

I am trying to train some network, and I have the above problem, in which I can't even get the network to over fit, but can't understand the source of the problem.

I've spent a long time googling this, and only found ways to prevent over fitting, but nothing about under fitting, or specifically, vanishing gradients.


What I did find:

Pytorch forum discussion about "bad gradients". It only refers to exploding gradients, and nan gradients, and leads to here and here which is more of the same.

I know that "making the network larger or more complex" is a general suggested way of causing over fitting (which is desired right now).
I also know that very deep networks can have their gradients vanish.
It is not clear to me that a larger network would solve the problem because it could create its own problem, as I just stated, and again I would not know how to debug this, while still seeing roughly the same behavior.
Changing the architecture to some res-net could help, but also could not, because the problem was not pinpointed to be caused by network depth.

Dead Relu can cause underfitting, and indeed moving to LeakyRelu helps, but still not enough.


How would one debug sources of under fitting in Pytorch, specifically, caused by vanishing gradients?

Instead of shooting blindly, trying things, I would like to be able to properly visualize the gradients in my network to know what I am actually trying to solve instead of guessing.
Surely, I am not the first one to have this requirement, and tools and methodologies were created for this purpose.

I would like to read about them, but don't know what to look for.

The specific net I have right now is irrelevant, as this is a general question about methodology.

解决方案

You can use tensorboard with Pytorch to visualize the training gradients. Add the gradients to a tensorboard histogram during training.


For example...

Let:

  • model be your pytorch model
  • model_input be an example input to your model
  • run_name be a string identifier for your training session

from torch.utils.tensorboard import SummaryWriter


summary_writer = SummaryWriter(comment=run_name)
summary_writer.add_graph(model, model_input, verbose=True)


# Training loop

for step_index in ...:
    
    # Calculate loss etc

    for name, param in model.named_parameters():
        summary_writer.add_histogram(f'{name}.grad', param.grad, step_index)

References:

这篇关于如何检测火炬中拟合不足和消失梯度的来源?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆