CNN火车的准确性在训练过程中变得更好,但测试准确性保持在40%左右 [英] CNN train accuracy gets better during training, but test accuracy stays around 40%

查看:80
本文介绍了CNN火车的准确性在训练过程中变得更好,但测试准确性保持在40%左右的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以在过去的几个月中,我已经学习了很多有关Tensorflow和Keras的神经网络的知识,所以我想尝试为CIFAR10数据集建立一个模型(下面的代码).

So in the past few months I've been learning a lot about neural networks with Tensorflow and Keras, so I wanted to try to make a model for the CIFAR10 dataset (code below).

但是,在训练过程中,准确性会提高(从1个周期后的35%到5个周期后的60-65%),但是val_acc保持不变或仅增加了一点.以下是打印结果:

However, during the training process, the accuracy gets better (from about 35% after 1 epoch to about 60-65% after 5 epochs), but the val_acc stays the same or increases only a little. Here are the printed results:

Epoch 1/5
50000/50000 [==============================] - 454s 9ms/step - loss: 1.7761 - acc: 0.3584 - val_loss: 8.6776 - val_acc: 0.4489
Epoch 2/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.3670 - acc: 0.5131 - val_loss: 8.9749 - val_acc: 0.4365
Epoch 3/5
50000/50000 [==============================] - 451s 9ms/step - loss: 1.2089 - acc: 0.5721 - val_loss: 7.7254 - val_acc: 0.5118
Epoch 4/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.1140 - acc: 0.6080 - val_loss: 7.9587 - val_acc: 0.4997
Epoch 5/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.0306 - acc: 0.6385 - val_loss: 7.4351 - val_acc: 0.5321
10000/10000 [==============================] - 27s 3ms/step
loss:  7.435152648162842 
accuracy:  0.5321

我在Internet上四处张望,我的最佳猜测是我的模型过拟合,因此我尝试删除一些层,添加更多的滤除层并减少过滤器的数量,但是没有任何改进.

I've looked around on the internet and my best guess is that my model is overfitted, so I've tried removing some layers, adding more dropout layers and reducing the amount of filters, but none showed any enhancement.

最奇怪的是,不久前,我根据一些教程制作了一个非常相似的模型,在8个时期后,其最终精度为80%. (尽管我丢失了该文件)

The weirdest thing is that a while ago I made a very similar model, based on some tutorials, which had a final accuracy of 80% after 8 epochs. (I lost that file though)

这是我模型的代码:

model = Sequential()
model.add(Conv2D(filters=256,
                 kernel_size=(3, 3),
                 activation='relu',
                 data_format='channels_last',
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=128,
                 kernel_size=(2, 2),
                 activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))


model.compile(optimizer=adam(),
              loss=categorical_crossentropy,
              metrics=['accuracy'])

model.fit(train_images, train_labels,
          batch_size=1000,
          epochs=5,
          verbose=1,
          validation_data=(test_images, test_labels))

loss, accuracy = model.evaluate(test_images, test_labels)
print('loss: ', loss, '\naccuracy: ', accuracy)

train_imagestest_images是大小(50000,32,32,3)numpy arrays(10000,32,32,3)train_labelstest_labels是大小(50000,10)(10000,10)numpy arrays.

train_images and test_images are numpy arrays of size (50000,32,32,3) and (10000,32,32,3) and train_labels and test_labels are numpy arrays of size (50000,10) and (10000,10).

我的问题:是什么原因造成的,我该怎么办?

My question: what causes this and what can I do about it?

我将模型更改为此:

model = Sequential()
model.add(Conv2D(filters=64,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal',    # better for relu based networks
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))

现在的输出是这样:

Epoch 1/10
50000/50000 [==============================] - 326s 7ms/step - loss: 1.4916 - acc: 0.4809 - val_loss: 7.7175 - val_acc: 0.5134
Epoch 2/10
50000/50000 [==============================] - 338s 7ms/step - loss: 1.0622 - acc: 0.6265 - val_loss: 6.9945 - val_acc: 0.5588
Epoch 3/10
50000/50000 [==============================] - 326s 7ms/step - loss: 0.8957 - acc: 0.6892 - val_loss: 6.6270 - val_acc: 0.5833
Epoch 4/10
50000/50000 [==============================] - 324s 6ms/step - loss: 0.7813 - acc: 0.7271 - val_loss: 5.5790 - val_acc: 0.6474
Epoch 5/10
50000/50000 [==============================] - 327s 7ms/step - loss: 0.6690 - acc: 0.7668 - val_loss: 5.7479 - val_acc: 0.6358
Epoch 6/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.5671 - acc: 0.8031 - val_loss: 5.8720 - val_acc: 0.6302
Epoch 7/10
50000/50000 [==============================] - 328s 7ms/step - loss: 0.4865 - acc: 0.8319 - val_loss: 5.6320 - val_acc: 0.6451
Epoch 8/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3995 - acc: 0.8611 - val_loss: 5.3879 - val_acc: 0.6615
Epoch 9/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3337 - acc: 0.8837 - val_loss: 5.6874 - val_acc: 0.6432
Epoch 10/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.2806 - acc: 0.9033 - val_loss: 5.7424 - val_acc: 0.6399
10000/10000 [==============================] - 19s 2ms/step
loss:  5.74234927444458 
accuracy:  0.6399

即使我在到目前为止获得的帮助下更改了模型,似乎我还是过度拟合了...有任何解释或提示吗?

It seems that I'm overfitting again, even though I changed the model with the help I've gotten so far... Any explanations or tips?

输入图像是标准化为(0,1)

推荐答案

您还没有包括如何准备数据,这是一个使网络学习得更好的补充:

You haven't included how you prepare the data, here's one addition that made this network learn much better:

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

如果您像这样进行数据归一化,那么您的网络就可以了:在5个星期后,它的测试精度达到了65-70%,这是一个很好的结果.请注意,只有5个纪元只是一个开始,大约需要30到50个纪元才能真正真正地学习数据并显示接近最新技术的结果.

If you do data normalization like that, then your network is fine: it hits ~65-70% test accuracy after 5 epochs, which is a good result. Note that 5 epochs is just a start, it would need around 30-50 epochs to really learn the data well and show a result close to state of the art.

以下是我注意到的一些细微改进,它们可以为您带来额外的性能提升:

Below are some minor improvements that I noticed and can get you extra performance points:

  • 由于您使用的是基于ReLu的网络,因此he_normal初始化程序glorot_uniform更好(是Conv2D中的默认设置).
  • 当您深入网络时减少过滤器的数量很奇怪.您应该做相反的事情.我更改了256 -> 64128 -> 256并提高了准确性.
  • 我稍微降低了辍学率0.5 -> 0.4.
  • 内核大小3x32x2更常见.我认为您也应该在第二个转换层尝试使用它.实际上,您可以使用所有超参数来找到最佳组合.
  • Since you're using ReLu based network, he_normal initializer is better than glorot_uniform (which is a default in Conv2D).
  • It is strange to decrease the number of filters as you go deeper in the network. You should do right the opposite. I changed 256 -> 64 and 128 -> 256 and the accuracy improved.
  • I decreased the dropout slightly 0.5 -> 0.4.
  • Kernel size 3x3 is more common than 2x2. I think you should try it for the second conv layer as well. In fact, you can play with all hyper-parameters to find the best combination.

这是最终代码:

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

model = Sequential()
model.add(Conv2D(filters=64,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal',
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
                 kernel_size=(2, 2),
                 kernel_initializer='he_normal',
                 activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer=adam(),
              loss=categorical_crossentropy,
              metrics=['accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

model.fit(x_train, y_train,
          batch_size=500,
          epochs=5,
          verbose=1,
          validation_data=(x_test, y_test))

loss, accuracy = model.evaluate(x_test, y_test)
print('loss: ', loss, '\naccuracy: ', accuracy)

5个纪元后的结果:

loss:  0.822134458447 
accuracy:  0.7126

顺便说一句,您可能有兴趣将自己的方法与keras

By the way, you might be interested to compare your approach with keras example CIFAR-10 conv net.

这篇关于CNN火车的准确性在训练过程中变得更好,但测试准确性保持在40%左右的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆