期望值中的NaN,即使被屏蔽,也会在权重矩阵中引入NaN [英] NaN in the expected values, even though masked, introduces NaN in weight matrix
问题描述
尝试处理丢失的数据,我编写了以下模型并运行了该模型.输出如下.为什么对NaN期望值进行训练,而NaN期望值却被loss_0_where_nan
掩盖了(历史记录表明损失确实被评估为0.0
),尽管如此,在hidden
和hidden
的权重矩阵中都引入了NaN
权重. max_min_pred
?我首先认为这可能是单个参数学习与输出值的权重,我认为这可能是Adadelta
优化器所特有的.但是对于SGD也是如此.
Trying to deal with missing data, I wrote the following model and ran it. The output is given below. Why does the training step on NaN expected values, which are masked by loss_0_where_nan
(and the history shows that the loss is indeed evaluated to 0.0
), nonetheless introduce NaN
weights in the weight matrices of both hidden
and max_min_pred
? I first thought this might be some weighting of individual parameter learning with output values, which I thought might be specific to the Adadelta
optimizer. But it also happens for SGD.
import keras
from keras.models import Model
from keras.optimizers import Adadelta
from keras.losses import mean_squared_error
from keras.layers import Input, Dense
import tensorflow as tf
import numpy
def loss_0_where_nan(loss_function, msg=""):
def filtered_loss_function(y_true, y_pred):
with_nans = loss_function(y_true, y_pred)
nans = tf.is_nan(with_nans)
filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans)
filtered = tf.Print(filtered,
[y_true, y_pred, nans, with_nans, filtered],
message=msg)
return filtered
return filtered_loss_function
input = Input(shape=(3,))
hidden = Dense(2)(input)
min_pred = Dense(1)(hidden)
max_min_pred = Dense(1)(hidden)
model = Model(inputs=[input],
outputs=[min_pred, max_min_pred])
model.compile(
optimizer=Adadelta(),
loss=[loss_0_where_nan(mean_squared_error, "aux: "),
loss_0_where_nan(mean_squared_error, "main: ")],
loss_weights=[0.2, 1.0])
def random_values(n, missing=False):
for i in range(n):
x = numpy.random.random(size=(2, 3))
_min = numpy.minimum(x[..., 0], x[..., 1])
if missing:
_max_min = numpy.full((len(x), 1), numpy.nan)
else:
_max_min = numpy.maximum(_min, x[..., 2]).reshape((-1, 1))
# print(x, numpy.array(_min).reshape((-1, 1)), numpy.array(_max_min), sep="\n", end="\n\n")
yield x, [numpy.array(_min).reshape((-1, 1)), numpy.array(_max_min)]
model.fit_generator(random_values(2, False),
steps_per_epoch=2,
verbose=False)
print("With missing")
history = model.fit_generator(random_values(1, True),
steps_per_epoch=1,
verbose=False)
print("Normal")
model.fit_generator(random_values(2, False),
steps_per_epoch=2,
verbose=False)
print(history.history)
输出:
main: [[0.29131493][0.769406676]][[-1.38235903][-3.32388687]][0 0][2.80118465 16.7550526][2.80118465 16.7550526]
aux: [[0.0422333851][0.0949674547]][[1.01466811][0.648737907]][0 0][0.945629239 0.306661695][0.945629239 0.306661695]
main: [[0.451149166][0.671600938]][[-2.46504498][-2.74316335]][0 0][8.50418854 11.6606159][8.50418854 11.6606159]
aux: [[0.451149166][0.355992794]][[0.893445313][0.917516708]][0 0][0.195625886 0.315309107][0.195625886 0.315309107]
With missing
aux: [[0.406784][0.44401589]][[0.852455556][1.23527527]][0 0][0.198623136 0.62609148][0.198623136 0.62609148]
main: [[nan][nan]][[-3.2140317][-2.22139478]][1 1][nan nan][0 0]
Normal
aux: [[0.490041673][0.00489727268]][[nan][nan]][1 1][nan nan][0 0]
main: [[0.867286][0.949406743]][[nan][nan]][1 1][nan nan][0 0]
aux: [[0.630184174][0.391073674]][[nan][nan]][1 1][nan nan][0 0]
main: [[0.630184174][0.391073674]][[nan][nan]][1 1][nan nan][0 0]
{'loss': [0.08247146010398865], 'dense_1_loss': [0.41235730051994324], 'dense_2_loss': [0.0]}
推荐答案
It seems like a problem similar to this TF issue about tf.where()
.
当y_true
为nan
时,与d/dw (filtered) = 1 * d/dw (tf.zeros_like) + 0 * d/dw (with_nans)
一样计算filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans)
的梯度.由于在这种情况下d/dw (with_nans)
为nan
,因此最终的渐变为1 * 0 + 0 * nan = nan
.
When y_true
is nan
, the gradient of filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans)
is calculated like d/dw (filtered) = 1 * d/dw (tf.zeros_like) + 0 * d/dw (with_nans)
. Since d/dw (with_nans)
is nan
in this case, the final gradient is 1 * 0 + 0 * nan = nan
.
为避免此问题,您可以将y_true
设置为0
而不是将nan
损耗值设置为0
,以便在y_true
为nan
时获得0个损耗值.
To avoid this issue, instead of setting the nan
loss values to 0
, you can set y_true
to y_pred
in order to get 0 loss values whenever y_true
is nan
.
def filtered_loss_function(y_true, y_pred):
nans = tf.is_nan(y_true)
masked_y_true = tf.where(nans, y_pred, y_true)
filtered = loss_function(masked_y_true, y_pred)
return filtered
由于filtered
不再取决于nan
值(在进入损失函数之前这些值已被屏蔽),因此渐变将不具有nan
s.
Since filtered
no longer depends on nan
values (the values are masked out before entering the loss function), the gradients will not have nan
s.
>>> model.get_weights()
[array([[ 0.9761261 , -0.7472908 ],
[-0.12295872, 0.39413464],
[-0.16676795, 0.30844116]], dtype=float32),
array([-0.00581209, 0.00300716], dtype=float32),
array([[-0.31789184],
[-0.87912357]], dtype=float32),
array([0.00628144], dtype=float32),
array([[-1.0932552 ],
[ 0.11788104]], dtype=float32),
array([0.00575602], dtype=float32)]
这篇关于期望值中的NaN,即使被屏蔽,也会在权重矩阵中引入NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!