为什么我的CIFAR 100 CNN模型主要预测两个类别? [英] Why does my CIFAR 100 CNN model mainly predict two classes?

查看:172
本文介绍了为什么我的CIFAR 100 CNN模型主要预测两个类别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试在CIFAR 100上使用Keras获得不错的成绩(> 40%准确度).但是,我遇到了CNN模型的怪异行为:它倾向于预测很多类别(2-5)比其他人更常见:

I am currently trying to get a decent score (> 40% accuracy) with Keras on CIFAR 100. However, I'm experiencing a weird behaviour of a CNN model: It tends to predict some classes (2 - 5) much more often than others:

位置(i,j)处的像素包含计数,预测来自类别i的验证集的多少个元素属于类别j.因此,对角线包含正确的分类,其他所有内容均为错误.两个竖线表示该模型通常可以预测那些类别,尽管事实并非如此.

The pixel at position (i, j) contains the count how many elements of the validation set from class i were predicted to be of class j. Thus the diagonal contains the correct classifications, everything else is an error. The two vertical bars indicate that the model often predicts those classes, although it is not the case.

CIFAR 100达到了完美的平衡:所有100个课程都有500个训练样本.

CIFAR 100 is perfectly balanced: All 100 classes have 500 training samples.

为什么模型比其他类更容易预测某些类?该如何解决?

Why does the model tend to predict some classes MUCH more often than other classes? How can this be fixed?

运行此过程需要一段时间.

Running this takes a while.

#!/usr/bin/env python

from __future__ import print_function
from keras.datasets import cifar100
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
import numpy as np

batch_size = 32
nb_classes = 100
nb_epoch = 50
data_augmentation = True

# input image dimensions
img_rows, img_cols = 32, 32
# The CIFAR10 images are RGB.
img_channels = 3

# The data, shuffled and split between train and test sets:
(X, y), (X_test, y_test) = cifar100.load_data()
X_train, X_val, y_train, y_val = train_test_split(X, y,
                                                  test_size=0.20,
                                                  random_state=42)

# Shuffle training data
perm = np.arange(len(X_train))
np.random.shuffle(perm)
X_train = X_train[perm]
y_train = y_train[perm]

print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_val.shape[0], 'validation samples')
print(X_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
Y_val = np_utils.to_categorical(y_val, nb_classes)

model = Sequential()

model.add(Convolution2D(32, 3, 3, border_mode='same',
                        input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_val /= 255
X_test /= 255

if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(X_train, Y_train,
              batch_size=batch_size,
              nb_epoch=nb_epoch,
              validation_data=(X_val, y_val),
              shuffle=True)
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    # Compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(X_train)

    # Fit the model on the batches generated by datagen.flow().
    model.fit_generator(datagen.flow(X_train, Y_train,
                                     batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=nb_epoch,
                        validation_data=(X_val, Y_val))
    model.save('cifar100.h5')

可视化代码

#!/usr/bin/env python


"""Analyze a cifar100 keras model."""

from keras.models import load_model
from keras.datasets import cifar100
from sklearn.model_selection import train_test_split
import numpy as np
import json
import io
import matplotlib.pyplot as plt
try:
    to_unicode = unicode
except NameError:
    to_unicode = str

n_classes = 100


def plot_cm(cm, zero_diagonal=False):
    """Plot a confusion matrix."""
    n = len(cm)
    size = int(n / 4.)
    fig = plt.figure(figsize=(size, size), dpi=80, )
    plt.clf()
    ax = fig.add_subplot(111)
    ax.set_aspect(1)
    res = ax.imshow(np.array(cm), cmap=plt.cm.viridis,
                    interpolation='nearest')
    width, height = cm.shape
    fig.colorbar(res)
    plt.savefig('confusion_matrix.png', format='png')

# Load model
model = load_model('cifar100.h5')

# Load validation data
(X, y), (X_test, y_test) = cifar100.load_data()

X_train, X_val, y_train, y_val = train_test_split(X, y,
                                                  test_size=0.20,
                                                  random_state=42)

# Calculate confusion matrix
y_val_i = y_val.flatten()
y_val_pred = model.predict(X_val)
y_val_pred_i = y_val_pred.argmax(1)
cm = np.zeros((n_classes, n_classes), dtype=np.int)
for i, j in zip(y_val_i, y_val_pred_i):
    cm[i][j] += 1

acc = sum([cm[i][i] for i in range(100)]) / float(cm.sum())
print("Validation accuracy: %0.4f" % acc)

# Create plot
plot_cm(cm)

# Serialize confusion matrix
with io.open('cm.json', 'w', encoding='utf8') as outfile:
    str_ = json.dumps(cm.tolist(),
                      indent=4, sort_keys=True,
                      separators=(',', ':'), ensure_ascii=False)
    outfile.write(to_unicode(str_))

红鲱鱼

tanh

我已将tanh替换为relu. 历史csv 看起来不错,但是可视化具有相同的问题:

Red herrings

tanh

I've replaced tanh by relu. The history csv looks ok, but the visualization has the same problem:

请注意,此处的验证准确性仅为3.44%.

Please also note that the validation accuracy here is only 3.44%.

删除辍学,用relu替换tanh,在所有地方都将边框模式设置为相同:历史csv

Removing dropout, replacing tanh by relu, setting border mode to same everywhere: history csv

可视化代码的准确性仍然比keras培训代码低得多(这次是8.50%).

The visualization code still gives a much lower accuracy (8.50% this time) than the keras training code.

以下是评论摘要:

  • 数据均匀地分布在各个类上.因此,这两个课程都没有过度训练".
  • 使用了数据扩充,但是没有数据扩充,问题仍然存在.
  • 可视化不是问题.

推荐答案

如果在训练和验证过程中获得了很好的准确性,而在测试时却没有,请确保在两种情况下都对数据集进行完全相同的预处理. 培训时,您在这里:

If you get good accuracy during training and validation, but not when testing, make sure you do exactly the same preprocessing on your dataset in both cases. Here you have when training:

X_train /= 255
X_val /= 255
X_test /= 255

但是在预测混乱矩阵时没有这样的代码.添加到测试中:

But no such code when predicting for your confusion matrix. Adding to testing:

X_val /=  255.

给出以下美观的混淆矩阵:

Gives the following nice looking confusion matrix:

这篇关于为什么我的CIFAR 100 CNN模型主要预测两个类别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆