如何在TensorFlow中调试NaN值? [英] How does one debug NaN values in TensorFlow?

查看:432
本文介绍了如何在TensorFlow中调试NaN值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行TensorFlow,并且碰巧产生了NaN.我想知道它是什么,但我不知道该怎么做.主要问题在于,在正常"过程程序中,我只是在执行操作之前编写一条打印语句. TensorFlow的问题在于我无法做到这一点,因为我先声明(或定义)了图形,因此在图形定义中添加打印语句无济于事.是否有任何规则,建议,试探法,以及用于追踪可能导致NaN的原因的任何东西?

I was running TensorFlow and I happen to have something yielding a NaN. I'd like to know what it is but I do not know how to do this. The main issue is that in a "normal" procedural program I would just write a print statement just before the operation is executed. The issue with TensorFlow is that I cannot do that because I first declare (or define) the graph, so adding print statements to the graph definition does not help. Are there any rules, advice, heuristics, anything to track down what might be causing the NaN?

在这种情况下,我更确切地知道要看哪一行,因为我有以下内容:

In this case I know more precisely what line to look at because I have the following:

Delta_tilde = 2.0*tf.matmul(x,W) - tf.add(WW, XX) #note this quantity should always be positive because its pair-wise euclidian distance
Z = tf.sqrt(Delta_tilde)
Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
Z = tf.pow(Z, 2.0)
A = tf.exp(Z) 

当存在此行时,我得到它返回我的摘要编写者声明的NaN.为什么是这样?有没有一种方法至少可以探索Z平方根后的值?

when this line is present I have it that it returns NaN as declared by my summary writers. Why is this? Is there a way to at least explore what value Z has after its being square rooted?

对于我发布的特定示例,我尝试了tf.Print(0,Z),但没有成功,但未打印任何内容.如:

For the specific example I posted, I tried tf.Print(0,Z) but with no success it printed nothing. As in:

Delta_tilde = 2.0*tf.matmul(x,W) - tf.add(WW, XX) #note this quantity should always be positive because its pair-wise euclidian distance
Z = tf.sqrt(Delta_tilde)
tf.Print(0,[Z]) # <-------- TF PRINT STATMENT
Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
Z = tf.pow(Z, 2.0)
A = tf.exp(Z) 

我实际上不理解tf.Print应该做什么.为什么需要两个参数?如果我想打印1张量,为什么我需要通过2?对我来说似乎很奇怪.

I actually don't understand what tf.Print is suppose to do. Why does it need two arguments? If I want to print 1 tensor why would I need to pass 2? Seems bizarre to me.

我正在查看函数 tf.add_check_numerics_ops( ),但未说明如何使用(再加上文档似乎没有太大帮助).有人知道如何使用吗?

I was looking at the function tf.add_check_numerics_ops() but it doesn't say how to use it (plus the docs seem to not be super helpful). Does anyone know how to use this?

由于我对数据有评论,可能不好,所以我使用的是标准MNIST.但是,我正在计算一个正数(成对的欧氏距离),然后对其平方根.因此,我看不到具体的数据会是什么问题.

Since I've had comments addressing the data might be bad, I am using standard MNIST. However, I am computing a quantity that is positive (pair-wise eucledian distance) and then square rooting it. Thus, I wouldn't see how the data specifically would be an issue.

推荐答案

有几个原因为什么可以得到NaN结果,通常是因为学习率太高,但是还有很多其他原因,例如例如您的输入队列中的数据已损坏或计算结果为0的日志.

There are a couple of reasons WHY you can get a NaN-result, often it is because of too high a learning rate but plenty other reasons are possible like for example corrupt data in your input-queue or a log of 0 calculation.

无论如何,使用您描述的打印调试无法通过简单的打印完成(因为这只会导致在图形内部打印张量信息,而不会打印任何实际值).

Anyhow, debugging with a print as you describe cannot be done by a simple print (as this would result only in the printing of the tensor-information inside the graph and not print any actual values).

但是,如果您将tf.print用作构建图表的操作( tf.print ),那么当图形执行时,您将获得打印的实际值(观看这些值以调试并了解网络的行为是一个很好的练习).

However, if you use tf.print as an op in bulding the graph (tf.print) then when the graph gets executed you will get the actual values printed (and it IS a good exercise to watch these values to debug and understand the behavior of your net).

但是,您并非完全以正确的方式使用打印语句.这是一个操作,因此您需要向其传递一个张量并请求一个结果张量,稍后在执行图中需要使用该结果张量.否则,将不会执行该操作,并且不会进行任何打印.试试这个:

However, you are using the print-statement not entirely in the correct manner. This is an op, so you need to pass it a tensor and request a result-tensor that you need to work with later on in the executing graph. Otherwise the op is not going to be executed and no printing occurs. Try this:

Z = tf.sqrt(Delta_tilde)
Z = tf.Print(Z,[Z], message="my Z-values:") # <-------- TF PRINT STATMENT
Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
Z = tf.pow(Z, 2.0)

这篇关于如何在TensorFlow中调试NaN值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆