在Keras上的前几个时期,神经网络并没有学习 [英] Neural network isn't learning for a first few epochs on Keras

查看:67
本文介绍了在Keras上的前几个时期,神经网络并没有学习的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用TensorFlow后端在Keras上测试简单网络,但在使用S型激活功能时遇到了问题

I'm testing simple networks on Keras with TensorFlow backend and I ran into an issue with using sigmoid activation function

对于前5到10个时期,网络没有学习,然后一切正常. 我尝试使用初始化程序和正则化程序,但这只会使情况变得更糟.

The network isn't learning for first 5-10 epochs, and then everything is fine. I tried using initializers and regularizers, but that only made it worse.

我使用这样的网络:

import numpy as np
import keras
from numpy import expand_dims
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot


# load the image
(x_train, y_train), (x_val, y_val), (x_test, y_test) = netowork2_ker.load_data_shared()

# expand dimension to one sample
x_train = expand_dims(x_train, 2)
x_train = np.reshape(x_train, (50000, 28, 28))
x_train = expand_dims(x_train, 3)

y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

datagen = ImageDataGenerator(
    rescale=1./255,
    width_shift_range=[-1, 0, 1],
    height_shift_range=[-1, 0, 1],
    rotation_range=10)

epochs = 20
batch_size = 50
num_classes = 10

model = keras.Sequential()
model.add(keras.layers.Conv2D(64, (3, 3), padding='same',
                 input_shape=x_train.shape[1:],
                 activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Conv2D(100, (3, 3),
                              activation='sigmoid'))
model.add(keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(100,
                             activation='sigmoid'))
#model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(num_classes,
                             activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                    steps_per_epoch=len(x_train) / batch_size, epochs=epochs,
                    verbose=2, shuffle=True)

使用上面的代码,我得到如下结果:

With the code above I get results like these:

Epoch 1/20 
 - 55s - loss: 2.3098 - accuracy: 0.1036 
Epoch 2/20 
 - 56s - loss: 2.3064 - accuracy: 0.1038
Epoch 3/20 
 - 56s - loss: 2.3068 - accuracy: 0.1025
Epoch 4/20 
 - 56s - loss: 2.3060 - accuracy: 0.1079
...

连续7个纪元(每次都不同),然后损耗迅速下降,在20个纪元中我达到了0.9623的精度.

For 7 epochs (different every time) and then the loss rapidly goes downward and i achieve 0.9623 accuracy in 20 epochs.

但是,如果我将激活方式从sigmoid更改为relu,则效果很好,并且在第一个时期给了我0.5356的准确性.

But if I change activation from sigmoid to relu it works great and gives me 0.5356 accuracy in the first epoch.

此问题使sigmoid对我几乎不可用,我想知道,我可以对此做一些事情.这是错误还是我做错了什么?

This issue makes sigmoid almost unusable for me and I'd like to know, I can do something about it. Is this a bug or am I doing something wrong?

推荐答案

激活功能建议:

在实践中,S形非线性最近已失宠,很少使用. ReLU是最常见的选择,如果网络中有大量死"单元,请尝试Leaky ReLU和tanh.切勿使用乙状结肠.

Activation function suggestion:

In practice, the sigmoid non-linearity has recently fallen out of favor and it is rarely ever used. ReLU is the most common choice, if there are a large fraction of "dead" units in network, try Leaky ReLU and tanh. Never use sigmoid.

乙状神经元的一个非常不希望的特性是,当神经元的激活在0或1的尾部饱和时,这些区域的梯度几乎为零.此外,Sigmoid输出也不是零中心的.

A very undesirable property of the sigmoid neuron is that when the neuron’s activation saturates at either tail of 0 or 1, the gradient at these regions is almost zero. In addition, Sigmoid outputs are not zero-centered.

这篇关于在Keras上的前几个时期,神经网络并没有学习的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆