在Keras中使用自定义损失函数时的批次大小问题 [英] Issue of batch sizes when using custom loss functions in Keras

查看:106
本文介绍了在Keras中使用自定义损失函数时的批次大小问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过定义自定义损失函数对标准神经网络进行一些修改.自定义损失函数不仅取决于y_true和y_pred,还取决于训练数据.我使用了此处所述的包装解决方案来实现它.

I am doing a slight modification of a standard neural network by defining a custom loss function. The custom loss function depends not only on y_true and y_pred, but also on the training data. I implemented it using the wrapping solution described here.

具体来说,我想定义一个自定义损失函数,它是标准的mse加上输入和y_pred的平方之间的mse:

Specifically, I wanted to define a custom loss function that is the standard mse plus the mse between the input and the square of y_pred:

def custom_loss(x_true)
    def loss(y_true, y_pred):
        return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true))
    return loss

然后我使用

model_custom.compile(loss = custom_loss( x_true=training_data ), optimizer='adam')

使用...拟合模型

model_custom.fit(training_data, training_label, epochs=100, batch_size = training_data.shape[0])

以上所有方法都可以正常工作,因为批次大小实际上是所有训练样本的数量.

All of the above works fine, because the batch size is actually the number of all the training samples.

但是如果我在有1000个训练样本时设置了不同的batch_size(例如10),则会出现错误

But if I set a different batch_size (e.g., 10) when I have 1000 training samples, there will be an error

不兼容的形状:[1000]对[10].

Incompatible shapes: [1000] vs. [10].

Keras似乎能够根据批次大小将输入的大小自动调整为其自身的损失函数,但对于自定义损失函数则无法这样做.

It seems that Keras is able to automatically adjust the size of the inputs to its own loss function base on the batch size, but cannot do so for the custom loss function.

您知道如何解决此问题吗?

Do you know how to solve this issue?

谢谢!

================================================ =========================

==========================================================================

谢谢Ori,建议将输入和输出层连接起来!从某种意义上说,代码可以在任何批处理量下运行,因此它可以工作".但是,训练新模型的结果似乎是错误的...下面是演示该问题的代码的简化版本:

Thank you, Ori, for the suggestion of concatenating the input and output layers! It "worked", in the sense that the codes can run under any batch size. However, it seems that the result from training the new model is wrong... Below is a simplified version of the codes to demonstrate the problem:

import numpy as np
import scipy.io
import keras
from keras import backend as K
from keras.models import Model
from keras.layers import Input, Dense, Activation
from numpy.random import seed
from tensorflow import set_random_seed

def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
    mse = K.mean( K.square( y_pred[:,2] - y_true ) )
    return mse

# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0

# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )

training_data  = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data   = x[5000:6000:1,:]
testing_label  = y[5000:6000:1]

# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)

input_standard = Input(shape=(2,))                                               # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard)                 # output layer

model_standard = Model(inputs=[input_standard], outputs=[output_standard])     # build the model
model_standard.compile(loss='mean_squared_error', optimizer='adam')            # compile the model
model_standard.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_standard = model_standard.predict(testing_data)             # make prediction

# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000

# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)

input_custom = Input(shape=(2,))                                             # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom)            # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])

model_custom = Model(inputs=[input_custom], outputs=[output_custom])         # build the model
model_custom.compile(loss = custom_loss, optimizer='adam')                   # compile the model
model_custom.fit(training_data, training_label, epochs=50, batch_size = 500) # train the model
testing_label_pred_custom = model_custom.predict(testing_data)               # make prediction

# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000

# compare the result
print( [ mse_standard , mse_custom ] )

基本上,我有一个标准的单层神经网络和一个自定义的单层神经网络,其输出层与输入层串联在一起.出于测试目的,我没有在自定义损失函数中使用串联的输入层,因为我想查看自定义网络是否可以重现标准神经网络.由于自定义损失函数等于标准的"mean_squared_error"损失,因此两个网络应具有相同的训练结果(我还重置了随机种子以确保它们具有相同的初始化).

Basically, I have a standard one-hidden-layer neural network, and a custom one-hidden-layer neural network whose output layer is concatenated with the input layer. For testing purpose, I did not use the concatenated input layer in the custom loss function, because I wanted to see if the custom network can reproduce the standard neural network. Since the custom loss function is equivalent to the standard 'mean_squared_error' loss, both networks should have the same training results (I also reset the random seeds to make sure that they have the same initialization).

但是,培训结果却大不相同.似乎串联使训练过程有所不同?有什么想法吗?

However, the training results are very different. It seems that the concatenation makes the training process different? Any ideas?

再次感谢您的帮助!

最终更新:Ori的连接输入和输出层的方法有效,并且已使用生成器进行了验证.谢谢!!

推荐答案

问题是,在编译模型时,您将x_true设置为静态张量(所有样本的大小).虽然用于keras损失函数的输入是y_true和y_pred,但每个函数的大小为[batch_size, :].

The problem is that when compiling the model, you set x_true to be a static tensor, in the size of all the samples. While the input for keras loss functions are the y_true and y_pred, where each of them is of size [batch_size, :].

正如我所见,您可以解决2个问题,第一个是使用生成器来创建批次,这样您就可以控制每次评估哪些索引以及损失函数您可以切片x_true张量以适合要评估的样本:

As I see it there are 2 options you can solve this, the first one is using a generator for creating the batches, in such a way that you will have control over which indices are evaluated each time, and at the loss function you could slice the x_true tensor to fit the samples being evaluated:

def custom_loss(x_true)
    def loss(y_true, y_pred):
        x_true_samples = relevant_samples(x_true)
        return K.mean(K.square(y_pred - y_true) + K.square(y_true - x_true_samples))
    return loss

此解决方案可能很复杂,我建议采用一种更简单的解决方法-
将输入层与输出层连接起来,这样您的新输出将采用original_output , input的形式.

现在,您可以使用新的修改后的损失函数:

This solution can be complicated, what I would suggest is a simpler workaround -
Concatenate the input layer with the output layer, such that your new output will be of the form original_output , input.

Now you can use a new modified loss function:

def loss(y_true, y_pred):
    return K.mean(K.square(y_pred[:,:output_shape] - y_true[:,:output_shape]) +
                  K.square(y_true[:,:output_shape] - y_pred[:,outputshape:))

现在,您的新损失函数将同时考虑输入数据和预测.

Now your new loss function will take in account both the input data, and the prediction.

修改:
请注意,在设置种子时,您的模型并不完全相同,并且由于您没有使用生成器,因此让keras选择批次,对于不同的模型,他可能会选择不同的样本.
由于您的模型无法收敛,因此不同的样本可能会导致不同的结果.


Note that while you set the seed, your models are not exactly the same, and as you did not use a generator, you let keras choose the batches, and for different models he might pick different samples.
As your model does not converge, different samples can lead to different results.

我在您的代码中添加了一个生成器,以验证我们选择用于训练的样本,现在您可以看到两个结果相同:

I added a generator to your code, to verify the samples we pick for training, now you can see both results are the same:

def custom_loss(y_true, y_pred): # this is essentially the mean_square_error
    mse = keras.losses.mean_squared_error(y_true, y_pred[:,2])
    return mse


def generator(x, y, batch_size):
    curIndex = 0
    batch_x = np.zeros((batch_size,2))
    batch_y = np.zeros((batch_size,1))
    while True:
        for i in range(batch_size):            
            batch_x[i] = x[curIndex,:]
            batch_y[i] = y[curIndex,:]
            i += 1;
            if i == 5000:
                i = 0
        yield batch_x, batch_y

# set the seeds so that we get the same initialization across different trials
seed_numpy = 0
seed_tensorflow = 0

# generate data of x = [ y^3 y^2 ]
y = np.random.rand(5000+1000,1) * 2 # generate 5000 training and 1000 testing samples
x = np.concatenate( ( np.power(y, 3) , np.power(y, 2) ) , axis=1 )

training_data  = x[0:5000:1,:]
training_label = y[0:5000:1]
testing_data   = x[5000:6000:1,:]
testing_label  = y[5000:6000:1]

batch_size = 32



# build the standard neural network with one hidden layer
seed(seed_numpy)
set_random_seed(seed_tensorflow)

input_standard = Input(shape=(2,))                                               # input
hidden_standard = Dense(10, activation='relu', input_shape=(2,))(input_standard) # hidden layer
output_standard = Dense(1, activation='linear')(hidden_standard)                 # output layer

model_standard = Model(inputs=[input_standard], outputs=[output_standard])     # build the model
model_standard.compile(loss='mse', optimizer='adam')            # compile the model
#model_standard.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_standard.fit_generator(generator(training_data,training_label,batch_size),  steps_per_epoch= 32, epochs= 100)
testing_label_pred_standard = model_standard.predict(testing_data)             # make prediction

# get the mean squared error
mse_standard = np.sum( np.power( testing_label_pred_standard - testing_label , 2 ) ) / 1000

# build the neural network with the custom loss
seed(seed_numpy)
set_random_seed(seed_tensorflow)


input_custom = Input(shape=(2,))                                               # input
hidden_custom = Dense(10, activation='relu', input_shape=(2,))(input_custom) # hidden layer
output_custom_temp = Dense(1, activation='linear')(hidden_custom)            # output layer
output_custom = keras.layers.concatenate([input_custom, output_custom_temp])

model_custom = Model(inputs=input_custom, outputs=output_custom)         # build the model
model_custom.compile(loss = custom_loss, optimizer='adam')                   # compile the model
#model_custom.fit(training_data, training_label, epochs=50, batch_size = 10) # train the model
model_custom.fit_generator(generator(training_data,training_label,batch_size),  steps_per_epoch= 32, epochs= 100)
testing_label_pred_custom = model_custom.predict(testing_data)

# get the mean squared error
mse_custom = np.sum( np.power( testing_label_pred_custom[:,2:3:1] - testing_label , 2 ) ) / 1000

# compare the result
print( [ mse_standard , mse_custom ] )

这篇关于在Keras中使用自定义损失函数时的批次大小问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆