tf.keras 损失变为 NaN [英] tf.keras loss becomes NaN

查看:28
本文介绍了tf.keras 损失变为 NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 tf.keras 中编写一个 3 层的神经网络.我的数据集是 MNIST 数据集.我减少了数据集中的示例数量,因此运行时间较低.这是我的代码:

I'm programming a neural network in tf.keras, with 3 layers. My dataset is the MNIST dataset. I decreased the number of examples in the dataset, so the runtime is lower. This is my code:

import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import pandas as pd

!git clone https://github.com/DanorRon/data
%cd data
!ls

batch_size = 32
epochs = 10
alpha = 0.0001
lambda_ = 0
h1 = 50

train = pd.read_csv('/content/first-repository/mnist_train.csv.zip')
test = pd.read_csv('/content/first-repository/mnist_test.csv.zip')

train = train.loc['1':'5000', :]
test = test.loc['1':'2000', :]

train = train.sample(frac=1).reset_index(drop=True)
test = test.sample(frac=1).reset_index(drop=True)

x_train = train.loc[:, '1x1':'28x28']
y_train = train.loc[:, 'label']

x_test = test.loc[:, '1x1':'28x28']
y_test = test.loc[:, 'label']

x_train = x_train.values
y_train = y_train.values

x_test = x_test.values
y_test = y_test.values

nb_classes = 10
targets = y_train.reshape(-1)
y_train_onehot = np.eye(nb_classes)[targets]

nb_classes = 10
targets = y_test.reshape(-1)
y_test_onehot = np.eye(nb_classes)[targets]

model = tf.keras.Sequential()
model.add(layers.Dense(784, input_shape=(784,)))
model.add(layers.Dense(h1, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(lambda_)))
model.add(layers.Dense(10, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(lambda_)))

model.compile(optimizer=tf.train.GradientDescentOptimizer(alpha), 
             loss = 'categorical_crossentropy',
             metrics = ['accuracy'])

model.fit(x_train, y_train_onehot, epochs=epochs, batch_size=batch_size)

每当我运行它时,都会发生以下三种情况之一:

Whenever I run it, one of 3 things happens:

  1. 损失减少,准确度增加了几个时期,直到损失无缘无故地变成 NaN 并且准确度直线下降.

  1. The loss decreases and the accuracy increases for a few epochs, until the loss becomes NaN for no apparent reason and the accuracy plummets.

每个时期的损失和准确度保持不变.通常损失为2.3025,准确度为0.0986.

The loss and accuracy stay the same for each epoch. Usually the loss is 2.3025 and the accuracy is 0.0986.

损失从 NaN 开始(并保持这种状态),而准确度保持较低.

The loss starts at NaN(and stays that way), while the accuracy stays low.

大多数时候,模型会做这些事情之一,但有时它会做一些随机的事情.似乎发生的这种不稳定行为是完全随机的.我不知道问题是什么.我该如何解决这个问题?

Most of the time, the model does one of these things, but sometimes it does something random. It seems like the type of erratic behavior that occurs is completely random. I have no idea what the problem is. How do I fix this problem?

有时,损失减少,但准确度保持不变.此外,有时损失减少而准确度增加,然后一段时间后准确度下降而损失仍然减少.或者,损失减少,准确度增加,然后切换,损失快速上升,而准确度下降,最终以loss:2.3025 acc:0.0986结束.

Sometimes, the loss decreases, but the accuracy stays the same. Also, sometimes the loss decreases and the accuracy increases, then after a while the accuracy decreases while the loss still decreases. Or, the loss decreases and the accuracy increases, then it switches and the loss goes up fast while the accuracy plummets, eventually ending with loss: 2.3025 acc: 0.0986.

编辑 2:这是一个有时会发生的事情的例子:

Edit 2: This is an example of something that sometimes happens:

Epoch 1/100
49999/49999 [==============================] - 5s 92us/sample - loss: 1.8548 - acc: 0.2390

Epoch 2/100
49999/49999 [==============================] - 5s 104us/sample - loss: 0.6894 - acc: 0.8050

Epoch 3/100
49999/49999 [==============================] - 4s 90us/sample - loss: 0.4317 - acc: 0.8821

Epoch 4/100
49999/49999 [==============================] - 5s 104us/sample - loss: 2.2178 - acc: 0.1345

Epoch 5/100
49999/49999 [==============================] - 5s 90us/sample - loss: 2.3025 - acc: 0.0986

Epoch 6/100
49999/49999 [==============================] - 4s 90us/sample - loss: 2.3025 - acc: 0.0986

Epoch 7/100
49999/49999 [==============================] - 4s 89us/sample - loss: 2.3025 - acc: 0.0986

编辑 3:我将损失更改为均方误差,现在网络运行良好.有没有办法让它保持交叉熵而不收敛到局部最小值?

Edit 3: I changed the loss to mean squared error and the network works well now. Is there a way to keep it in cross entropy without it converging to a local minimum?

推荐答案

我将损失更改为均方误差,现在网络运行良好

I changed the loss to mean squared error and the network works well now

MSE不是适合此类分类问题的损失函数;你当然应该坚持 loss = 'categorical_crossentropy'.

MSE is not the appropriate loss function for such classification problems; you should certainly stick to loss = 'categorical_crossentropy'.

问题很可能是由于您的 MNIST 数据未标准化;您应该将最终变量标准化为

Most probably, the issue is due to your MNIST data being not normalized; you should normalize your final variables as

x_train = x_train.values/255
x_test = x_test.values/255

不规范化输入数据是导致梯度问题爆炸的一个已知原因,这可能就是这里发生的事情.

Not normalizing input data is a known cause of exploding gradient problems, which is probably what is happening here.

其他建议:为您的第一个密集层设置 activation='relu',并去掉正则化器和来自所有层的初始化器参数(默认的 glorot_uniform 实际上是一个更好的初始化器,而这里的正则化实际上可能对性能有害).

Other advice: set activation='relu' for your first dense layer, and get rid of both the regularizer & initializer arguments from all layers (the default glorot_uniform is actually a better initializer, while regularization here may actually be harmful for the performance).

作为一般建议,尝试不要重新发明轮子 - 从 Keras 示例 使用内置的 MNIST 数据...

As a general advice, try not to reinvent the wheel - start with a Keras example using the built-in MNIST data...

这篇关于tf.keras 损失变为 NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆