test_on_batch和train_on_batch的不同损耗值 [英] Different loss values for test_on_batch and train_on_batch

查看:415
本文介绍了test_on_batch和train_on_batch的不同损耗值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在尝试训练GAN生成图像时,我遇到了一个无法解释的问题.

While trying to train a GAN for image generation I ran into a problem which I cannot explain.

训练发电机时,train_on_batch返回的损耗仅经过2或3次迭代即可直接降至零.经过调查,我意识到train_on_batch方法的一些奇怪行为:

When training the generator, the loss which is returned by train_on_batch after just 2 or 3 iterations directly drops to zero. After investigating I realized some strange behavior of the train_on_batch method:

当我检查以下内容时:

noise = np.random.uniform(-1.0, 1.0, size=[batch_size, gen_noise_length])
predictions = GAN.stackedModel.predict(noise)

这将返回所有接近于零的值,这是我期望的,因为尚未对生成器进行训练.

This returns values all close to zero as I would expect since the generator is not trained yet.

但是:

y = np.ones([batch_size, 1])
noise = np.random.uniform(-1.0, 1.0, size=[batch_size, gen_noise_length])
loss = GAN.stackedModel.train_on_batch(noise, y)

尽管我的预期目标很明显,但损失几乎为零. 当我跑步时:

here the loss is almost zero even though my expected targets are obvious ones. When I run:

y = np.ones([batch_size, 1])
noise = np.random.uniform(-1.0, 1.0, size=[batch_size, gen_noise_length])
loss = GAN.stackedModel.test_on_batch(noise, y)

如我所料,返还的损失很高.

the returned loss is high as I would expect.

train_on_batch方法是怎么回事?我真的很无知...

What is going on with the train_on_batch method? I'm really clueless here...

修改

我的损失是二进制交叉熵,我建立了像这样的模型:

My loss is binary-crossentropy and I build the model like:

def createStackedModel(self):
    # Build stacked GAN model
    gan_in = Input([self.noise_length])
    H = self.genModel(gan_in)
    gan_V = self.disModel(H)
    GAN = Model(gan_in, gan_V)
    opt = RMSprop(lr=0.0001, decay=3e-8)
    GAN.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
    return GAN

修改2

生成器是通过堆叠每个包含BatchNormalization的那些块来构造的:

The generator is constructed by stacking some of those blocks each containing a BatchNormalization:

    self.G.add(UpSampling2D())
    self.G.add(Conv2DTranspose(int(depth/8), 5, padding='same'))
    self.G.add(BatchNormalization(momentum=0.5))
    self.G.add(Activation('relu'))

修改3

我将代码加载到 https://gitlab.com/benjamingraf24/DCGAN/ 显然,问题出在我构建GAN网络的方式上.因此,在GANBuilder.py中一定有问题.但是,我找不到它...

I loaded my code to https://gitlab.com/benjamingraf24/DCGAN/ Apparently the problem results from the way how I build the GAN network. So in GANBuilder.py there must be something wrong. However, I cant find it...

推荐答案

BatchNormalization层在训练和测试阶段的行为有所不同.

BatchNormalization layers behave differently during training and testing phase.

在训练阶段,他们将使用当前批次平均值和激活的方差进行归一化.

During training phase they will use the current batch mean and variance of the activations to normalize.

但是,在测试阶段,他们使用在训练过程中收集的移动平均值和移动方差.如果没有足够的事先培训,这些收集的值可能与实际批次统计数据相差甚远,从而导致明显的损失值差异.

However, during testing phase they use the moving mean and moving variance that they collected during training. Without enough previous training these collected values can be far from the actual batch statistics, resulting in significant loss value differences.

请参阅有关BatchNormalization的Keras文档. momentum参数用于定义运动平均值和运动平均值在训练过程中适应新收集的批次值的速度.

Refer to the Keras documentation for BatchNormalization. The momentum argument is used to define how fast the moving mean and moving average will adapt to freshly collected values of batches during training.

这篇关于test_on_batch和train_on_batch的不同损耗值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆