Theano sqrt 返回 NaN 值 [英] Theano sqrt returning NaN values

查看:52
本文介绍了Theano sqrt 返回 NaN 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的代码中,我使用 theano 来计算欧几里得距离矩阵(代码来自 here):

In my code I'm using theano to calculate an euclidean distance matrix (code from here):

import theano
import theano.tensor as T
MAT = T.fmatrix('MAT')
squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)
f_euclidean = theano.function([MAT], T.sqrt(squared_euclidean_distances))
def pdist_euclidean(mat):
    return f_euclidean(mat)

但是下面的代码导致矩阵的一些值是NaN.我读过在计算 theano.tensor.sqrt()这里 建议

But the following code causes some values of the matrix to be NaN. I've read that this happens when calculating theano.tensor.sqrt() and here it's suggested to

在 sqrt(或 max(x,EPs))内添加一个 eps

Add an eps inside the sqrt (or max(x,EPs))

所以我在我的代码中添加了一个 eps:

So I've added an eps to my code:

import theano
import theano.tensor as T

eps = 1e-9

MAT = T.fmatrix('MAT')

squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)

f_euclidean = theano.function([MAT], T.sqrt(eps+squared_euclidean_distances))

def pdist_euclidean(mat):
    return f_euclidean(mat)

我在执行 sqrt 之前添加它.我得到的 NaN 减少了,但我仍然得到它们.解决问题的正确方法是什么?我还注意到如果 MATT.dmatrix() 则没有 NaN

And I'm adding it before performing sqrt. I'm getting less NaNs, but I'm still getting them. What is the proper solution to the problem? I've also noticed that if MAT is T.dmatrix() there are no NaN

推荐答案

在计算欧几里得距离时,有两种可能的 NaN 来源.

There are two likely sources of NaNs when computing Euclidean distances.

  1. 浮点表示近似问题会在实际为零时导致负距离.负数的平方根未定义(假设您对复数解不感兴趣).

  1. Floating point representation approximation issues causing negative distances when it's really just zero. The square root of a negative number is undefined (assuming you're not interested in the complex solution).

想象一下MAT有价值

[[ 1.62434536 -0.61175641 -0.52817175 -1.07296862  0.86540763]
 [-2.3015387   1.74481176 -0.7612069   0.3190391  -0.24937038]
 [ 1.46210794 -2.06014071 -0.3224172  -0.38405435  1.13376944]
 [-1.09989127 -0.17242821 -0.87785842  0.04221375  0.58281521]]

现在,如果我们分解计算,我们会看到 (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) 有值

Now, if we break down the computation we see that (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) has value

[[ 10.3838024   -9.92394296  10.39763039  -1.51676099]
 [ -9.92394296  18.16971188 -14.23897281   5.53390084]
 [ 10.39763039 -14.23897281  15.83764622  -0.65066204]
 [ -1.51676099   5.53390084  -0.65066204   4.70316652]]

2 * MAT.dot(MAT.T) 有值

[[ 10.3838024   14.27675714  13.11072431   7.54348446]
 [ 14.27675714  18.16971188  17.00367905  11.4364392 ]
 [ 13.11072431  17.00367905  15.83764622  10.27040637]
 [  7.54348446  11.4364392   10.27040637   4.70316652]]

这两个值的对角线应该相等(向量与其自身之间的距离为零),从这个文本表示看起来是正确的,但实际上它们略有不同——差异太小了当我们像这样打印浮点值时显示

The diagonal of these two values should be equal (the distance between a vector and itself is zero) and from this textual representation it looks like that is true, but in fact they are slightly different -- the differences are too small to show up when we print the floating point values like this

当我们打印完整表达式的值(从第一个矩阵中减去上面的第二个矩阵)时,这变得很明显

This becomes apparent when we print the value of the full expression (the second of the matrices above subtracted from the first)

[[  0.00000000e+00   2.42007001e+01   2.71309392e+00   9.06024545e+00]
 [  2.42007001e+01  -7.10542736e-15   3.12426519e+01   5.90253836e+00]
 [  2.71309392e+00   3.12426519e+01   0.00000000e+00   1.09210684e+01]
 [  9.06024545e+00   5.90253836e+00   1.09210684e+01   0.00000000e+00]]

对角线几乎由零组成,但第二行第二列中的项目现在是一个非常小的负值.然后,当您计算所有这些值的平方根时,您会在该位置得到 NaN,因为负数的平方根未定义(对于实数).

The diagonal is almost composed of zeros but the item in the second row, second column is now a very small negative value. When you then compute the square root of all these values you get NaN in that position because the square root of a negative number is undefined (for real numbers).

[[ 0.          4.91942071  1.64714721  3.01002416]
 [ 4.91942071         nan  5.58951267  2.42951402]
 [ 1.64714721  5.58951267  0.          3.30470398]
 [ 3.01002416  2.42951402  3.30470398  0.        ]]

  • 计算欧几里得距离表达式相对于函数输入内的变量的梯度.这不仅可能发生在由于浮点近似值生成的负数时(如上所述),而且在任何输入的长度为零时也会发生.

  • Computing the gradient of a Euclidean distance expression with respect to a variable inside the input to the function. This can happen not only if a negative number of generated due to floating point approximations, as above, but also if any of the inputs are zero length.

    如果 y = sqrt(x) 那么 dy/dx = 1/(2 * sqrt(x)).所以如果 x=0 或者,为了你的目的,如果 squared_euclidean_distances=0 那么梯度将是 NaN 因为 2 * sqrt(0) = 0 并且除以零是未定义的.

    If y = sqrt(x) then dy/dx = 1/(2 * sqrt(x)). So if x=0 or, for your purposes, if squared_euclidean_distances=0 then the gradient will be NaN because 2 * sqrt(0) = 0 and dividing by zero is undefined.

    可以通过强制平方距离不小于零来确保平方距离永远不会为负来解决第一个问题:

    The solution to the first problem can be achieved by ensuring squared distances are never negative by forcing them to be no less than zero:

    T.sqrt(T.maximum(squared_euclidean_distances, 0.))
    

    要解决这两个问题(如果您需要梯度),那么您需要确保平方距离永远不会为负或为零,因此绑定一个小的正 epsilon:

    To solve both problems (if you need gradients) then you need to make sure the squared distances are never negative or zero, so bound with a small positive epsilon:

    T.sqrt(T.maximum(squared_euclidean_distances, eps))
    

    第一个解决方案是有道理的,因为问题仅来自近似表示.第二个有点可疑,因为真正的距离为零,所以从某种意义上说,梯度应该是未定义的.您的特定用例可能会产生一些替代解决方案,该解决方案在没有人工限制的情况下保持语义(例如,通过确保从不计算/使用梯度为零长度向量).但是 NaN 值可能是有害的:它们可以像杂草一样传播.

    The first solution makes sense since the problem only arises from approximate representations. The second is a bit more questionable because the true distance is zero so, in a sense, the gradient should be undefined. Your specific use case may yield some alternative solution that is maintains the semantics without an artificial bound (e.g. by ensuring that gradients are never computed/used for zero-length vectors). But NaN values can be pernicious: they can spread like weeds.

    这篇关于Theano sqrt 返回 NaN 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆