Keras模型无法减少损失 [英] Keras model fails to decrease loss

查看:66
本文介绍了Keras模型无法减少损失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我提出一个示例,其中tf.keras模型无法从非常简单的数据中学习.我正在使用tensorflow-gpu==2.0.0keras==2.3.0和Python 3.7.在文章的结尾,我提供了Python代码来重现我观察到的问题.

I propose a example in which a tf.keras model fails to learn from very simple data. I'm using tensorflow-gpu==2.0.0, keras==2.3.0 and Python 3.7. At the end of my post, I give the Python code to reproduce the problem I observed.

  1. 数据

样本是形状为(6、16、16、16、16、3)的Numpy数组.为了使事情变得简单,我只考虑充满1和0的数组.带有1的数组的标号为1,带有0的数组的标号为0.我可以使用以下代码生成一些样本(在下面的n_samples = 240中):

The samples are Numpy arrays of shape (6, 16, 16, 16, 3). To make things very simple, I only consider arrays full of 1s and 0s. Arrays with 1s are given the label 1 and arrays with 0s are given the label 0. I can generate some samples (in the following, n_samples = 240) with this code:

def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
        else:
            yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])

为了在tf.keras模型中输入此数据,我使用以下代码创建了tf.data.Dataset的实例.这实际上将创建BATCH_SIZE = 12样本的改组批次.

In order to input this data in a tf.keras model, I create an instance of tf.data.Dataset using the code below. This will essentially create shuffled batches of BATCH_SIZE = 12 samples.

def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                             output_types=(tf.float32,
                                                           tf.float32),
                                             output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
                                                            tf.TensorShape([2])))
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    return dataset

  1. 模型

我建议使用以下模型对样本进行分类:

I propose the following model to classify my samples:

def create_model(in_shape=(6, 16, 16, 16, 3)):

    input_layer = Input(shape=in_shape)

    reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)

    conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)

    relu_layer_1 = ReLU()(conv3d_layer)

    pooling_layer = GlobalAveragePooling3D()(relu_layer_1)

    reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)

    expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)

    conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)

    relu_layer_2 = ReLU()(conv1d_layer)

    reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)

    out = Dense(units=2, activation='softmax')(reshape_layer_2)

    return Model(inputs=[input_layer], outputs=[out])

使用Adam(具有默认参数)和binary_crossentropy损失对模型进行了优化:

The model is optimized using Adam (with default parameters) and with the binary_crossentropy loss:

clf_model = create_model()
clf_model.compile(optimizer=Adam(),
                  loss='categorical_crossentropy',
                  metrics=['accuracy', 'categorical_crossentropy'])

clf_model.summary()的输出是:

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 6, 16, 16, 16, 3) 0         
_________________________________________________________________
lambda (Lambda)              (None, 16, 16, 16, 3)     0         
_________________________________________________________________
conv3d (Conv3D)              (None, 8, 8, 8, 64)       98368     
_________________________________________________________________
re_lu (ReLU)                 (None, 8, 8, 8, 64)       0         
_________________________________________________________________
global_average_pooling3d (Gl (None, 64)                0         
_________________________________________________________________
lambda_1 (Lambda)            (None, 384)               0         
_________________________________________________________________
lambda_2 (Lambda)            (None, 1, 384)            0         
_________________________________________________________________
conv1d (Conv1D)              (None, 1, 1)              385       
_________________________________________________________________
re_lu_1 (ReLU)               (None, 1, 1)              0         
_________________________________________________________________
lambda_3 (Lambda)            (None, 1)                 0         
_________________________________________________________________
dense (Dense)                (None, 2)                 4         
=================================================================
Total params: 98,757
Trainable params: 98,757
Non-trainable params: 0

  1. 培训

该模型训练了500个纪元,如下所示:

The model is trained for 500 epochs as follows:

train_ds = make_tfdataset(for_training=True)

history = clf_model.fit(train_ds,
                        epochs=500,
                        steps_per_epoch=ceil(240 / BATCH_SIZE),
                        verbose=1)

  1. 问题!

在500个时期内,模型损失保持在0.69左右,并且永远不会低于0.69.如果我将学习率设置为1e-2而不是1e-3,则也是如此.数据非常简单(分别为0和1).天真地,我希望模型具有比0.6更好的准确性.实际上,我希望它可以快速达到100%的准确性.我在做什么错了?

During the 500 epochs, the model loss stays around 0.69 and never goes below 0.69. This is also true if I set the learning rate to 1e-2 instead of 1e-3. The data is very simple (just 0s and 1s). Naively, I would expect the model to have a better accuracy than just 0.6. In fact, I would expect it to reach 100% accuracy quickly. What I am doing wrong?

  1. 完整代码...

import numpy as np
import tensorflow as tf
import tensorflow.keras.backend as K
from math import ceil
from tensorflow.keras.layers import Input, Dense, Lambda, Conv1D, GlobalAveragePooling3D, Conv3D, ReLU
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

BATCH_SIZE = 12


def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
        else:
            yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])


def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                             output_types=(tf.float32,
                                                           tf.float32),
                                             output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
                                                            tf.TensorShape([2])))
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    return dataset


def create_model(in_shape=(6, 16, 16, 16, 3)):

    input_layer = Input(shape=in_shape)

    reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)

    conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)

    relu_layer_1 = ReLU()(conv3d_layer)

    pooling_layer = GlobalAveragePooling3D()(relu_layer_1)

    reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)

    expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)

    conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)

    relu_layer_2 = ReLU()(conv1d_layer)

    reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)

    out = Dense(units=2, activation='softmax')(reshape_layer_2)

    return Model(inputs=[input_layer], outputs=[out])


train_ds = make_tfdataset(for_training=True)
clf_model = create_model(in_shape=(6, 16, 16, 16, 3))
clf_model.summary()
clf_model.compile(optimizer=Adam(lr=1e-3),
                  loss='categorical_crossentropy',
                  metrics=['accuracy', 'categorical_crossentropy'])

history = clf_model.fit(train_ds,
                        epochs=500,
                        steps_per_epoch=ceil(240 / BATCH_SIZE),
                        verbose=1)

推荐答案

您的代码有一个关键问题:维度改组.您应该从不接触的一个维度是批量维度-根据定义,它包含数据的独立样本.在第一次重塑中,您将要素尺寸与批尺寸混合在一起:

Your code has a single critical problem: dimensionality shuffling. The one dimension you should never touch is the batch dimension - as it, by definition, holds independent samples of your data. In your first reshape, you mix features dimensions with the batch dimension:

Tensor("input_1:0", shape=(12, 6, 16, 16, 16, 3), dtype=float32)
Tensor("lambda/Reshape:0", shape=(72, 16, 16, 16, 3), dtype=float32)

这就像喂入72个形状为(16,16,16,3)的独立样本一样.进一步的层也遇到类似的问题.

This is like feeding 72 independent samples of shape (16,16,16,3). Further layers suffer similar problems.


解决方案:

  • 不要重塑过程中的每个步骤(应使用Reshape),而是对现有的Conv和缓冲层进行整形,以使所有内容都可以直接解决.
  • 除了输入和输出图层外,最好为每个图层加上简短的标题-不会丢失清晰度,因为每一行都由图层名称很好地定义了
  • GlobalAveragePooling旨在作为 final 层,因为它折叠了要素尺寸-在您的情况下,如下所示:(12,16,16,16,3) --> (12,3);转换之后没有什么用处
  • 以上,我将Conv1D替换为Conv3D
  • 除非您使用可变的批处理大小,否则始终选择batch_shape=shape=,因为您可以完整检查图层尺寸(非常有帮助)
  • 您真实的batch_size这是6,根据您的评论回复推导出来
  • kernel_size=1和(尤其是)filters=1是一个非常弱的卷积,我相应地替换了它-如果需要,您可以还原
  • 如果您的预期应用程序中只有2个类,我建议使用Dense(1, 'sigmoid')binary_crossentropy损失
  • Instead of reshaping every step of the way (for which you should use Reshape), shape your existing Conv and pooling layers to make everything work out directly.
  • Aside the input and output layers, it's better to title each layer something short and simple - no clarity is lost, as each line is well-defined by layer name
  • GlobalAveragePooling is intended to be the final layer, as it collapses features dimensions - in your case, like so: (12,16,16,16,3) --> (12,3); Conv afterwards serves little purpose
  • Per above, I replaced Conv1D with Conv3D
  • Unless you're using variable batch sizes, always go for batch_shape= vs. shape=, as you can inspect layer dimensions in full (very helpful)
  • Your true batch_size here is 6, deducing from your comment reply
  • kernel_size=1 and (especially) filters=1 is a very weak convolution, I replaced it accordingly - you can revert if you wish
  • If you have only 2 classes in your intended application, I advise using Dense(1, 'sigmoid') with binary_crossentropy loss

最后一点:您可以将除 以外的所有内容抛弃,以获取尺寸改组建议,并且仍能获得理想的列车设置性能;这是问题的根源.

As a last note: you can toss all of the above out except for the dimensionality shuffling advice, and still get perfect train set performance; it was the root of the problem.

def create_model(batch_size, input_shape):

    ipt = Input(batch_shape=(batch_size, *input_shape))
    x   = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2),
                             activation='relu', padding='same')(ipt)
    x   = Conv3D(filters=8,  kernel_size=4, strides=(2, 2, 2),
                             activation='relu', padding='same')(x)
    x   = GlobalAveragePooling3D()(x)
    out = Dense(units=2, activation='softmax')(x)

    return Model(inputs=ipt, outputs=out)

BATCH_SIZE = 6
INPUT_SHAPE = (16, 16, 16, 3)
BATCH_SHAPE = (BATCH_SIZE, *INPUT_SHAPE)

def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones(INPUT_SHAPE), np.array([0., 1.])
        else:
            yield np.zeros(INPUT_SHAPE), np.array([1., 0.])


def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                 output_types=(tf.float32,
                                               tf.float32),
                                 output_shapes=(tf.TensorShape(INPUT_SHAPE),
                                                tf.TensorShape([2])))
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    return dataset


结果:

Epoch 28/500
40/40 [==============================] - 0s 3ms/step - loss: 0.0808 - acc: 1.0000

这篇关于Keras模型无法减少损失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆