期望值中的NaN,即使被屏蔽,也会在权重矩阵中引入NaN [英] NaN in the expected values, even though masked, introduces NaN in weight matrix

查看:102
本文介绍了期望值中的NaN,即使被屏蔽,也会在权重矩阵中引入NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试处理丢失的数据,我编写了以下模型并运行了该模型.输出如下.为什么对NaN期望值进行训练,而NaN期望值却被loss_0_where_nan掩盖了(历史记录表明损失确实被评估为0.0),尽管如此,在hiddenhidden的权重矩阵中都引入了NaN权重. max_min_pred?我首先认为这可能是单个参数学习与输出值的权重,我认为这可能是Adadelta优化器所特有的.但是对于SGD也是如此.

Trying to deal with missing data, I wrote the following model and ran it. The output is given below. Why does the training step on NaN expected values, which are masked by loss_0_where_nan (and the history shows that the loss is indeed evaluated to 0.0), nonetheless introduce NaN weights in the weight matrices of both hidden and max_min_pred? I first thought this might be some weighting of individual parameter learning with output values, which I thought might be specific to the Adadelta optimizer. But it also happens for SGD.

import keras
from keras.models import Model
from keras.optimizers import Adadelta
from keras.losses import mean_squared_error
from keras.layers import Input, Dense

import tensorflow as tf
import numpy

def loss_0_where_nan(loss_function, msg=""):
    def filtered_loss_function(y_true, y_pred):
        with_nans = loss_function(y_true, y_pred)
        nans = tf.is_nan(with_nans)
        filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans)
        filtered = tf.Print(filtered,
                            [y_true, y_pred, nans, with_nans, filtered],
                            message=msg)
        return filtered
    return filtered_loss_function

input = Input(shape=(3,))

hidden = Dense(2)(input)
min_pred = Dense(1)(hidden)
max_min_pred = Dense(1)(hidden)

model = Model(inputs=[input],
              outputs=[min_pred, max_min_pred])

model.compile(
    optimizer=Adadelta(),
    loss=[loss_0_where_nan(mean_squared_error, "aux: "),
          loss_0_where_nan(mean_squared_error, "main: ")],
    loss_weights=[0.2, 1.0])

def random_values(n, missing=False):
    for i in range(n):
        x = numpy.random.random(size=(2, 3))
        _min = numpy.minimum(x[..., 0], x[..., 1])
        if missing:
            _max_min = numpy.full((len(x), 1), numpy.nan)
        else:
            _max_min = numpy.maximum(_min, x[..., 2]).reshape((-1, 1))
        # print(x, numpy.array(_min).reshape((-1, 1)), numpy.array(_max_min), sep="\n", end="\n\n")
        yield x, [numpy.array(_min).reshape((-1, 1)), numpy.array(_max_min)]

model.fit_generator(random_values(2, False),
                    steps_per_epoch=2,
                    verbose=False)
print("With missing")
history = model.fit_generator(random_values(1, True),
                              steps_per_epoch=1,
                    verbose=False)
print("Normal")
model.fit_generator(random_values(2, False),
                    steps_per_epoch=2,
                    verbose=False)

print(history.history)

输出:

main: [[0.29131493][0.769406676]][[-1.38235903][-3.32388687]][0 0][2.80118465 16.7550526][2.80118465 16.7550526]
aux: [[0.0422333851][0.0949674547]][[1.01466811][0.648737907]][0 0][0.945629239 0.306661695][0.945629239 0.306661695]
main: [[0.451149166][0.671600938]][[-2.46504498][-2.74316335]][0 0][8.50418854 11.6606159][8.50418854 11.6606159]
aux: [[0.451149166][0.355992794]][[0.893445313][0.917516708]][0 0][0.195625886 0.315309107][0.195625886 0.315309107]
With missing
aux: [[0.406784][0.44401589]][[0.852455556][1.23527527]][0 0][0.198623136 0.62609148][0.198623136 0.62609148]
main: [[nan][nan]][[-3.2140317][-2.22139478]][1 1][nan nan][0 0]
Normal
aux: [[0.490041673][0.00489727268]][[nan][nan]][1 1][nan nan][0 0]
main: [[0.867286][0.949406743]][[nan][nan]][1 1][nan nan][0 0]
aux: [[0.630184174][0.391073674]][[nan][nan]][1 1][nan nan][0 0]
main: [[0.630184174][0.391073674]][[nan][nan]][1 1][nan nan][0 0]
{'loss': [0.08247146010398865], 'dense_1_loss': [0.41235730051994324], 'dense_2_loss': [0.0]}

推荐答案

似乎是一个类似于

It seems like a problem similar to this TF issue about tf.where().

y_truenan时,与d/dw (filtered) = 1 * d/dw (tf.zeros_like) + 0 * d/dw (with_nans)一样计算filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans)的梯度.由于在这种情况下d/dw (with_nans)nan,因此最终的渐变为1 * 0 + 0 * nan = nan.

When y_true is nan, the gradient of filtered = tf.where(nans, tf.zeros_like(with_nans), with_nans) is calculated like d/dw (filtered) = 1 * d/dw (tf.zeros_like) + 0 * d/dw (with_nans). Since d/dw (with_nans) is nan in this case, the final gradient is 1 * 0 + 0 * nan = nan.

为避免此问题,您可以将y_true设置为0而不是将nan损耗值设置为0,以便在y_truenan时获得0个损耗值.

To avoid this issue, instead of setting the nan loss values to 0, you can set y_true to y_pred in order to get 0 loss values whenever y_true is nan.

def filtered_loss_function(y_true, y_pred):
    nans = tf.is_nan(y_true)
    masked_y_true = tf.where(nans, y_pred, y_true)
    filtered = loss_function(masked_y_true, y_pred)
    return filtered

由于filtered不再取决于nan值(在进入损失函数之前这些值已被屏蔽),因此渐变将不具有nan s.

Since filtered no longer depends on nan values (the values are masked out before entering the loss function), the gradients will not have nans.

>>> model.get_weights()
[array([[ 0.9761261 , -0.7472908 ],
        [-0.12295872,  0.39413464],
        [-0.16676795,  0.30844116]], dtype=float32),
 array([-0.00581209,  0.00300716], dtype=float32),
 array([[-0.31789184],
        [-0.87912357]], dtype=float32),
 array([0.00628144], dtype=float32),
 array([[-1.0932552 ],
        [ 0.11788104]], dtype=float32),
 array([0.00575602], dtype=float32)]

这篇关于期望值中的NaN,即使被屏蔽,也会在权重矩阵中引入NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆