BatchNormalization结果差 [英] Poor Result with BatchNormalization

查看:66
本文介绍了BatchNormalization结果差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试实施 DCGan ,这是该手册的纸,并被下面两个阻止问题几乎持续了2个星期.任何建议,将不胜感激.谢谢.

I have been trying to implement the DCGan, the face book's paper, and blocked by below two issues almost for 2 weeks. Any suggestions would be appreciated. Thanks.

问题1:

DCGAN论文建议生成器和鉴别器都使用BN(批量归一化).但是,使用BN不能获得比没有BN更好的结果.

DCGAN paper suggest to use BN(Batch Normalization) both the generator and discriminator. But, I couldn't get better result with BN rather than w/out BN.

我复制了我使用的DCGAN模型,该模型与DCGAN纸完全相同.我不认为这是由于过度拟合造成的.因为(1)它始终显示与初始噪声图像相同的噪声,并且似乎从未受到过训练.(2)损失值非常稳定,gan和判别器都没有真正改变.(它保持在0.6〜0.7左右,并且从未像两个模型都崩溃时一样感觉到跌落或颠簸.)如果我仅检查损失函数,则似乎训练得很好.

I copied DCGAN model I used which is exactly the same with a DCGAN paper. I don't think it is due to overfitting. Because (1) It keeps showing the noise the same with an initial noise picture and seems never been trained. (2) The Loss value is very stable that gan and discriminator both are not really changed. (It's staying about 0.6 ~ 0.7 and never felt down or bumped up like when both models are collapsed.) If I check only loss function, seems like it is getting trained well.

问题2:

当我使用float16时,它总是给我Nan以下模型.我将epsilon都更改为1e-4 1e-3,但是都失败了.这是另一个问题.如果我不使用BatchNormalization,则可以是Nan.它足够有意义,我可以理解.但是,如果我使用BatchNormalization,它将在每一层中进行标准化.即使结果变得非常大或非常小,它也会在每一层中进行批量归一化,结果几乎会居中并且不应该出现淡出.是不是这实际上是我的想法,但我不知道我在想什么错.请有人帮帮我.

When I used float16, it always gives me Nan with the model below. I have changed epsilon as 1e-4 1e-3 both but failed. And here is one more question. If I don't use the BatchNormalization, it can be Nan. it enough makes sense, I can get it. But, if I use BatchNormalization, it normalizes in every layer. Even if the result becomes very big number or very small number it will be batch normalized in every layer that the result will be almost centered and the fade-out shouldn't happen. isn't it? it's actually my thought but I don't know what I am thinking wrong.. please, somebody, help me.

=====生成器=====

===== Generator =====

输入#(无,128)< =潜伏

Input # (None, 128) <= latent

密集#(无,16384)
BatchNormalization
LeakyReLU

Dense # (None, 16384)
BatchNormalization
LeakyReLU

重塑#(无,4,4,1024)

Reshape # (None, 4, 4, 1024)

Conv2DTranspose#(无,4、4、512)

Conv2DTranspose # (None, 4, 4, 512)

BatchNormalization
LeakyReLU

BatchNormalization
LeakyReLU

Conv2DTranspose#(无,8、8、256)

Conv2DTranspose # (None, 8, 8, 256)

BatchNormalization
LeakyReLU

BatchNormalization
LeakyReLU

Conv2DTranspose#(无,16、16、128)

Conv2DTranspose # (None, 16, 16, 128)

BatchNormalization
LeakyReLU

BatchNormalization
LeakyReLU

Conv2DTranspose#(无,32、32、64)

Conv2DTranspose # (None, 32, 32, 64)

BatchNormalization
LeakyReLU

BatchNormalization
LeakyReLU

Conv2DTranspose#(无,64、64、32)

Conv2DTranspose # (None, 64, 64, 32)

BatchNormalization
LeakyReLU

BatchNormalization
LeakyReLU

Conv2DTranspose#(无,128、128、16)

Conv2DTranspose # (None, 128, 128, 16)

BatchNormalization
LeakyReLU

BatchNormalization
LeakyReLU

Conv2D#(无,128、128、3)

Conv2D # (None, 128, 128, 3)

=====判别器=====

===== Discriminator =====

Conv2D#(无,128、128、3)LeakyReLU

Conv2D # (None, 128, 128, 3) LeakyReLU

Conv2D#(无,64、64、16)BatchNormalization
辍学
LeakyReLU

Conv2D # (None, 64, 64, 16) BatchNormalization
Dropout
LeakyReLU

Conv2D#(无,32、32、32)
BatchNormalization
辍学
LeakyReLU

Conv2D # (None, 32, 32, 32)
BatchNormalization
Dropout
LeakyReLU

Conv2D#(无,16、16、64)
BatchNormalization
辍学
LeakyReLU

Conv2D # (None, 16, 16, 64)
BatchNormalization
Dropout
LeakyReLU

Conv2D#(无,8、8、128)
BatchNormalization
辍学
LeakyReLU

Conv2D # (None, 8, 8, 128)
BatchNormalization
Dropout
LeakyReLU

Conv2D#(无,4、4、4、256)
BatchNormalization
辍学
LeakyReLU

Conv2D # (None, 4, 4, 256)
BatchNormalization
Dropout
LeakyReLU

Conv2D#(无,2、2、512)
BatchNormalization
辍学
LeakyReLU

Conv2D # (None, 2, 2, 512)
BatchNormalization
Dropout
LeakyReLU

平板
辍学
密集

Flatten
Dropout
Dense

我尝试过的最后一个超参数如下,我没有忘记在训练图像中添加高斯噪声.

and the last hyperparameters I have tried are as below and I didn't forget to add the gaussian noise to training pictures.

image_shape => (128, 128, 3)
latent_dim => 128
channels => 3
iterations => 10000
batch_size => 128
epsilon => 0.005
weight_init_stddev => 0.02
beta_1 => 0.5
discriminator_lr => 0.0002
gan_lr => 0.0002

推荐答案

使用鉴别器中的 SpectralNorm 和生成器中的 SelfModulationBatchNorm .或者,如果您有标签,请使用 ConditionalBatchNorm .

Using SpectralNorm in the discriminator and SelfModulationBatchNorm in the generator. Or use ConditionalBatchNorm if you have a label.

代码,其他方法的帮助和GAN培训可以在此处

Code, help with other methods and GAN training can be found here

这篇关于BatchNormalization结果差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆