Keras-验证损失和准确性停留在0 [英] Keras - Validation Loss and Accuracy stuck at 0

查看:255
本文介绍了Keras-验证损失和准确性停留在0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为Tensorflow keras中的二进制分类训练一个简单的2层全连接神经网络.我已使用sklearn的train_test_split()将数据分为80-20个训练和验证集.

I am trying to train a simple 2 layer Fully Connected neural net for Binary Classification in Tensorflow keras. I have split my data into Training and Validation sets with a 80-20 split using sklearn's train_test_split().

当我呼叫model.fit(X_train, y_train, validation_data=[X_val, y_val])时,它显示所有时期的验证损失和准确性均为0 ,但是训练得很好.

When I call model.fit(X_train, y_train, validation_data=[X_val, y_val]), it shows 0 validation loss and accuracy for all epochs, but it trains just fine.

此外,当我尝试在验证集上对其进行评估时,输出为非零.

Also, when I try to evaluate it on the validation set, the output is non-zero.

有人可以解释一下为什么我在验证时面临着0损失0准确性错误的问题.感谢您的帮助.

Can someone please explain why I am facing this 0 loss 0 accuracy error on validation. Thanks for your help.

以下是此错误的完整示例代码(MCVE): https://colab.research.google.com/drive/1P8iCUlnD87vqtuS5YTdoePcDOVEKpBHr?usp=sharing

Here is the complete sample code (MCVE) for this error: https://colab.research.google.com/drive/1P8iCUlnD87vqtuS5YTdoePcDOVEKpBHr?usp=sharing

推荐答案

  • 如果使用keras而不是tf.keras,则一切正常.

    • If you use keras instead of tf.keras everything works fine.

      tf.keras中,我什至尝试了validation_data = [X_train, y_train],这也给出了零精度.

      With tf.keras, I even tried validation_data = [X_train, y_train], this also gives zero accuracy.

      这是一个示范:

      model.fit(X_train, y_train, validation_data=[X_train.to_numpy(), y_train.to_numpy()], 
      epochs=10, batch_size=64)
      
      Epoch 1/10
      8/8 [==============================] - 0s 6ms/step - loss: 0.7898 - accuracy: 0.6087 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      Epoch 2/10
      8/8 [==============================] - 0s 6ms/step - loss: 0.6710 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      Epoch 3/10
      8/8 [==============================] - 0s 5ms/step - loss: 0.6748 - accuracy: 0.6500 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      Epoch 4/10
      8/8 [==============================] - 0s 6ms/step - loss: 0.6716 - accuracy: 0.6370 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      Epoch 5/10
      8/8 [==============================] - 0s 6ms/step - loss: 0.6085 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      Epoch 6/10
      8/8 [==============================] - 0s 6ms/step - loss: 0.6744 - accuracy: 0.6326 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      Epoch 7/10
      8/8 [==============================] - 0s 6ms/step - loss: 0.6102 - accuracy: 0.6522 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      Epoch 8/10
      8/8 [==============================] - 0s 6ms/step - loss: 0.7032 - accuracy: 0.6109 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      Epoch 9/10
      8/8 [==============================] - 0s 5ms/step - loss: 0.6283 - accuracy: 0.6717 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      Epoch 10/10
      8/8 [==============================] - 0s 5ms/step - loss: 0.6120 - accuracy: 0.6652 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
      

      因此,fittensorflow实现肯定存在一些问题.

      So, definitely there is some issue with tensorflow implementation of fit.

      我挖了源代码,似乎是造成validation_data的部分:

      I dug up the source, and it seems the part responsible for validation_data:

      ...
      ...
              # Run validation.
              if validation_data and self._should_eval(epoch, validation_freq):
                val_x, val_y, val_sample_weight = (
                    data_adapter.unpack_x_y_sample_weight(validation_data))
                val_logs = self.evaluate(
                    x=val_x,
                    y=val_y,
                    sample_weight=val_sample_weight,
                    batch_size=validation_batch_size or batch_size,
                    steps=validation_steps,
                    callbacks=callbacks,
                    max_queue_size=max_queue_size,
                    workers=workers,
                    use_multiprocessing=use_multiprocessing,
                    return_dict=True)
                val_logs = {'val_' + name: val for name, val in val_logs.items()}
                epoch_logs.update(val_logs)
      

      内部调用model.evaluate,因为我们已经建立了evaluate可以正常工作,所以我意识到唯一的罪魁祸首可能是unpack_x_y_sample_weight.

      internally calls model.evaluate, as we have already established evaluate works fine, I realized the only culprit could be unpack_x_y_sample_weight.

      所以,我研究了实现:

      def unpack_x_y_sample_weight(data):
        """Unpacks user-provided data tuple."""
        if not isinstance(data, tuple):
          return (data, None, None)
        elif len(data) == 1:
          return (data[0], None, None)
        elif len(data) == 2:
          return (data[0], data[1], None)
        elif len(data) == 3:
          return (data[0], data[1], data[2])
      
        raise ValueError("Data not understood.")
      
      

      这很疯狂,但是如果您只通过一个元组而不是一个列表,由于unpack_x_y_sample_weight中的检查,一切都可以正常工作. (此步骤之后,您的标签丢失了,并且数据以某种方式固定在evaluate内部,因此您在培训时没有合理的标签,这似乎是一个错误,但是文档中明确指出要传递元组)

      It's crazy, but if you just pass a tuple instead of a list, everything works fine due to the check inside unpack_x_y_sample_weight. (Your labels are missing after this step and somehow the data is getting fixed inside evaluate, so you're training with no reasonable labels, this seems like a bug but the documentation clearly states to pass tuple)

      以下代码可提供正确的验证准确性和准确性:

      The following code gives correct validation accuracy and loss:

      model.fit(X_train, y_train, validation_data=(X_train.to_numpy(), y_train.to_numpy()), 
      epochs=10, batch_size=64)
      
      Epoch 1/10
      8/8 [==============================] - 0s 7ms/step - loss: 0.5832 - accuracy: 0.6696 - val_loss: 0.6892 - val_accuracy: 0.6674
      Epoch 2/10
      8/8 [==============================] - 0s 7ms/step - loss: 0.6385 - accuracy: 0.6804 - val_loss: 0.8984 - val_accuracy: 0.5565
      Epoch 3/10
      8/8 [==============================] - 0s 7ms/step - loss: 0.6822 - accuracy: 0.6391 - val_loss: 0.6556 - val_accuracy: 0.6739
      Epoch 4/10
      8/8 [==============================] - 0s 6ms/step - loss: 0.6276 - accuracy: 0.6609 - val_loss: 1.0691 - val_accuracy: 0.5630
      Epoch 5/10
      8/8 [==============================] - 0s 7ms/step - loss: 0.7048 - accuracy: 0.6239 - val_loss: 0.6474 - val_accuracy: 0.6326
      Epoch 6/10
      8/8 [==============================] - 0s 7ms/step - loss: 0.6545 - accuracy: 0.6500 - val_loss: 0.6659 - val_accuracy: 0.6043
      Epoch 7/10
      8/8 [==============================] - 0s 7ms/step - loss: 0.5796 - accuracy: 0.6913 - val_loss: 0.6891 - val_accuracy: 0.6435
      Epoch 8/10
      8/8 [==============================] - 0s 7ms/step - loss: 0.5915 - accuracy: 0.6891 - val_loss: 0.5307 - val_accuracy: 0.7152
      Epoch 9/10
      8/8 [==============================] - 0s 7ms/step - loss: 0.5571 - accuracy: 0.7000 - val_loss: 0.5465 - val_accuracy: 0.6957
      Epoch 10/10
      8/8 [==============================] - 0s 7ms/step - loss: 0.7133 - accuracy: 0.6283 - val_loss: 0.7046 - val_accuracy: 0.6413
      

      因此,由于这似乎是一个 bug ,我刚刚在Tensorflow Github存储库中打开了一个相关问题:

      So, as this seems to be a bug, I have just opened a relevant issue at Tensorflow Github repo:

      https://github.com/tensorflow/tensorflow/issues/39370

      这篇关于Keras-验证损失和准确性停留在0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆