在向后传递中调试 nans [英] Debugging nans in the backward pass

查看:72
本文介绍了在向后传递中调试 nans的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试调试一个有点复杂且非规范的 NN 架构.计算前向传递很好,并且给了我预期的结果,但是当我尝试使用 Adam 或任何标准优化器进行优化时,即使经过一次迭代后,学习率非常低,我到处都是 nans.我正在尝试对它们进行本地化,并想知道是否有办法捕获 nan 的第一次出现并检测它出现在哪个操作中?我尝试了 tf.add_check_numerics_ops() 但它似乎没有做任何事情,或者我使用它不正确.

I'm trying to debug a somewhat complicated and non-canonical NN architecture. Computing the forward pass is fine and is giving me the expected results, but when I try to optimize using Adam or any of the standard optimizers, even after one iteration with a very small learning rate I get nans everywhere. I'm trying to localize them and was wondering if there's a way to catch the first occurrence of a nan and detect in which op it arose? I tried tf.add_check_numerics_ops() but it doesn't appear to be doing anything, or perhaps I'm using it incorrectly.

推荐答案

调试 NaN 可能很棘手,尤其是在您拥有大型网络的情况下.tf.add_check_numerics_ops()向图中添加 ops,断言图中的每个浮点张量不包含任何 NaN 值,但默认情况下不运行这些检查.相反,它返回一个您可以定期或在每一步运行的操作,如下所示:

Debugging NaNs can be tricky, especially if you have a large network. tf.add_check_numerics_ops() adds ops to the graph that assert that each floating point tensor in the graph does not contain any NaN values, but does not run these checks by default. Instead it returns an op that you can run periodically, or on every step, as follows:

train_op = ...
check_op = tf.add_check_numerics_ops()

sess = tf.Session()
sess.run([train_op, check_op])  # Runs training and checks for NaNs

这篇关于在向后传递中调试 nans的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆