Keras模型无法减少损失 [英] Keras model fails to decrease loss

查看：66 发布时间：2021/2/14 20:44:09 python tensorflow keras deep-learning tensorflow-datasets

本文介绍了Keras模型无法减少损失的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我提出一个示例，其中tf.keras模型无法从非常简单的数据中学习.我正在使用tensorflow-gpu==2.0.0，keras==2.3.0和Python 3.7.在文章的结尾，我提供了Python代码来重现我观察到的问题.

I propose a example in which a tf.keras model fails to learn from very simple data. I'm using tensorflow-gpu==2.0.0, keras==2.3.0 and Python 3.7. At the end of my post, I give the Python code to reproduce the problem I observed.

数据

样本是形状为(6、16、16、16、16、3)的Numpy数组.为了使事情变得简单，我只考虑充满1和0的数组.带有1的数组的标号为1，带有0的数组的标号为0.我可以使用以下代码生成一些样本(在下面的n_samples = 240中):

The samples are Numpy arrays of shape (6, 16, 16, 16, 3). To make things very simple, I only consider arrays full of 1s and 0s. Arrays with 1s are given the label 1 and arrays with 0s are given the label 0. I can generate some samples (in the following, n_samples = 240) with this code:

def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
        else:
            yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])

为了在tf.keras模型中输入此数据，我使用以下代码创建了tf.data.Dataset的实例.这实际上将创建BATCH_SIZE = 12样本的改组批次.

In order to input this data in a tf.keras model, I create an instance of tf.data.Dataset using the code below. This will essentially create shuffled batches of BATCH_SIZE = 12 samples.

def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                             output_types=(tf.float32,
                                                           tf.float32),
                                             output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
                                                            tf.TensorShape([2])))
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    return dataset

模型

我建议使用以下模型对样本进行分类:

I propose the following model to classify my samples:

def create_model(in_shape=(6, 16, 16, 16, 3)):

    input_layer = Input(shape=in_shape)

    reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)

    conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)

    relu_layer_1 = ReLU()(conv3d_layer)

    pooling_layer = GlobalAveragePooling3D()(relu_layer_1)

    reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)

    expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)

    conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)

    relu_layer_2 = ReLU()(conv1d_layer)

    reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)

    out = Dense(units=2, activation='softmax')(reshape_layer_2)

    return Model(inputs=[input_layer], outputs=[out])

使用Adam(具有默认参数)和binary_crossentropy损失对模型进行了优化:

The model is optimized using Adam (with default parameters) and with the binary_crossentropy loss:

clf_model = create_model()
clf_model.compile(optimizer=Adam(),
                  loss='categorical_crossentropy',
                  metrics=['accuracy', 'categorical_crossentropy'])

clf_model.summary()的输出是:

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 6, 16, 16, 16, 3) 0         
_________________________________________________________________
lambda (Lambda)              (None, 16, 16, 16, 3)     0         
_________________________________________________________________
conv3d (Conv3D)              (None, 8, 8, 8, 64)       98368     
_________________________________________________________________
re_lu (ReLU)                 (None, 8, 8, 8, 64)       0         
_________________________________________________________________
global_average_pooling3d (Gl (None, 64)                0         
_________________________________________________________________
lambda_1 (Lambda)            (None, 384)               0         
_________________________________________________________________
lambda_2 (Lambda)            (None, 1, 384)            0         
_________________________________________________________________
conv1d (Conv1D)              (None, 1, 1)              385       
_________________________________________________________________
re_lu_1 (ReLU)               (None, 1, 1)              0         
_________________________________________________________________
lambda_3 (Lambda)            (None, 1)                 0         
_________________________________________________________________
dense (Dense)                (None, 2)                 4         
=================================================================
Total params: 98,757
Trainable params: 98,757
Non-trainable params: 0

培训

该模型训练了500个纪元，如下所示:

The model is trained for 500 epochs as follows:

train_ds = make_tfdataset(for_training=True)

history = clf_model.fit(train_ds,
                        epochs=500,
                        steps_per_epoch=ceil(240 / BATCH_SIZE),
                        verbose=1)

问题！

在500个时期内，模型损失保持在0.69左右，并且永远不会低于0.69.如果我将学习率设置为1e-2而不是1e-3，则也是如此.数据非常简单(分别为0和1).天真地，我希望模型具有比0.6更好的准确性.实际上，我希望它可以快速达到100％的准确性.我在做什么错了?

During the 500 epochs, the model loss stays around 0.69 and never goes below 0.69. This is also true if I set the learning rate to 1e-2 instead of 1e-3. The data is very simple (just 0s and 1s). Naively, I would expect the model to have a better accuracy than just 0.6. In fact, I would expect it to reach 100% accuracy quickly. What I am doing wrong?

完整代码...

import numpy as np
import tensorflow as tf
import tensorflow.keras.backend as K
from math import ceil
from tensorflow.keras.layers import Input, Dense, Lambda, Conv1D, GlobalAveragePooling3D, Conv3D, ReLU
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam

BATCH_SIZE = 12


def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones((6, 16, 16, 16, 3)), np.array([0., 1.])
        else:
            yield np.zeros((6, 16, 16, 16, 3)), np.array([1., 0.])


def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                             output_types=(tf.float32,
                                                           tf.float32),
                                             output_shapes=(tf.TensorShape([6, 16, 16, 16, 3]),
                                                            tf.TensorShape([2])))
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    return dataset


def create_model(in_shape=(6, 16, 16, 16, 3)):

    input_layer = Input(shape=in_shape)

    reshaped_input = Lambda(lambda x: K.reshape(x, (-1, *in_shape[1:])))(input_layer)

    conv3d_layer = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2), padding='same')(reshaped_input)

    relu_layer_1 = ReLU()(conv3d_layer)

    pooling_layer = GlobalAveragePooling3D()(relu_layer_1)

    reshape_layer_1 = Lambda(lambda x: K.reshape(x, (-1, in_shape[0] * 64)))(pooling_layer)

    expand_dims_layer = Lambda(lambda x: K.expand_dims(x, 1))(reshape_layer_1)

    conv1d_layer = Conv1D(filters=1, kernel_size=1)(expand_dims_layer)

    relu_layer_2 = ReLU()(conv1d_layer)

    reshape_layer_2 = Lambda(lambda x: K.squeeze(x, 1))(relu_layer_2)

    out = Dense(units=2, activation='softmax')(reshape_layer_2)

    return Model(inputs=[input_layer], outputs=[out])


train_ds = make_tfdataset(for_training=True)
clf_model = create_model(in_shape=(6, 16, 16, 16, 3))
clf_model.summary()
clf_model.compile(optimizer=Adam(lr=1e-3),
                  loss='categorical_crossentropy',
                  metrics=['accuracy', 'categorical_crossentropy'])

history = clf_model.fit(train_ds,
                        epochs=500,
                        steps_per_epoch=ceil(240 / BATCH_SIZE),
                        verbose=1)

推荐答案

您的代码有一个关键问题:维度改组.您应该从不接触的一个维度是批量维度-根据定义，它包含数据的独立样本.在第一次重塑中，您将要素尺寸与批尺寸混合在一起:

Your code has a single critical problem: dimensionality shuffling. The one dimension you should never touch is the batch dimension - as it, by definition, holds independent samples of your data. In your first reshape, you mix features dimensions with the batch dimension:

Tensor("input_1:0", shape=(12, 6, 16, 16, 16, 3), dtype=float32)
Tensor("lambda/Reshape:0", shape=(72, 16, 16, 16, 3), dtype=float32)

这就像喂入72个形状为(16,16,16,3)的独立样本一样.进一步的层也遇到类似的问题.

This is like feeding 72 independent samples of shape (16,16,16,3). Further layers suffer similar problems.

解决方案:

不要重塑过程中的每个步骤(应使用Reshape)，而是对现有的Conv和缓冲层进行整形，以使所有内容都可以直接解决.
除了输入和输出图层外，最好为每个图层加上简短的标题-不会丢失清晰度，因为每一行都由图层名称很好地定义了
GlobalAveragePooling旨在作为 final 层，因为它折叠了要素尺寸-在您的情况下，如下所示:(12,16,16,16,3) --> (12,3);转换之后没有什么用处
以上，我将Conv1D替换为Conv3D
除非您使用可变的批处理大小，否则始终选择batch_shape=和shape=，因为您可以完整检查图层尺寸(非常有帮助)
您真实的batch_size这是6，根据您的评论回复推导出来
kernel_size=1和(尤其是)filters=1是一个非常弱的卷积，我相应地替换了它-如果需要，您可以还原
如果您的预期应用程序中只有2个类，我建议使用Dense(1, 'sigmoid')且binary_crossentropy损失

Instead of reshaping every step of the way (for which you should use Reshape), shape your existing Conv and pooling layers to make everything work out directly.
Aside the input and output layers, it's better to title each layer something short and simple - no clarity is lost, as each line is well-defined by layer name
GlobalAveragePooling is intended to be the final layer, as it collapses features dimensions - in your case, like so: (12,16,16,16,3) --> (12,3); Conv afterwards serves little purpose
Per above, I replaced Conv1D with Conv3D
Unless you're using variable batch sizes, always go for batch_shape= vs. shape=, as you can inspect layer dimensions in full (very helpful)
Your true batch_size here is 6, deducing from your comment reply
kernel_size=1 and (especially) filters=1 is a very weak convolution, I replaced it accordingly - you can revert if you wish
If you have only 2 classes in your intended application, I advise using Dense(1, 'sigmoid') with binary_crossentropy loss

最后一点:您可以将除以外的所有内容抛弃，以获取尺寸改组建议，并且仍能获得理想的列车设置性能；这是问题的根源.

As a last note: you can toss all of the above out except for the dimensionality shuffling advice, and still get perfect train set performance; it was the root of the problem.

def create_model(batch_size, input_shape):

    ipt = Input(batch_shape=(batch_size, *input_shape))
    x   = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2),
                             activation='relu', padding='same')(ipt)
    x   = Conv3D(filters=8,  kernel_size=4, strides=(2, 2, 2),
                             activation='relu', padding='same')(x)
    x   = GlobalAveragePooling3D()(x)
    out = Dense(units=2, activation='softmax')(x)

    return Model(inputs=ipt, outputs=out)

BATCH_SIZE = 6
INPUT_SHAPE = (16, 16, 16, 3)
BATCH_SHAPE = (BATCH_SIZE, *INPUT_SHAPE)

def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones(INPUT_SHAPE), np.array([0., 1.])
        else:
            yield np.zeros(INPUT_SHAPE), np.array([1., 0.])


def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                 output_types=(tf.float32,
                                               tf.float32),
                                 output_shapes=(tf.TensorShape(INPUT_SHAPE),
                                                tf.TensorShape([2])))
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    return dataset

结果:

Epoch 28/500
40/40 [==============================] - 0s 3ms/step - loss: 0.0808 - acc: 1.0000

这篇关于Keras模型无法减少损失的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Keras模型无法减少损失 [英] Keras model fails to decrease loss

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Keras模型无法减少损失 [英] Keras model fails to decrease loss

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭