为什么我的 CIFAR 100 CNN 模型主要预测两个类别? [英] Why does my CIFAR 100 CNN model mainly predict two classes?

查看：20 发布时间：2021/12/31 17:07:16 neural-network computer-vision keras

本文介绍了为什么我的 CIFAR 100 CNN 模型主要预测两个类别?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正试图在 CIFAR 100 上使用 Keras 获得不错的分数(> 40% 准确率).但是，我遇到了 CNN 模型的一种奇怪行为:它倾向于预测某些类别 (2 - 5)比其他人更频繁:

位置 (i, j) 处的像素包含来自第 i 类的验证集元素被预测为第 j 类的计数.因此对角线包含正确的分类，其他一切都是错误的.两个竖线表示模型经常预测这些类别，尽管事实并非如此.

CIFAR 100 是完美平衡的:所有 100 个类都有 500 个训练样本.

为什么模型倾向于比其他类别更频繁地预测某些类别?如何解决这个问题?

代码

运行这需要一段时间.

#!/usr/bin/env python从 __future__ 导入 print_function从 keras.datasets 导入 cifar100从 keras.preprocessing.image 导入 ImageDataGenerator从 keras.models 导入顺序从 keras.layers 导入 Dense、Dropout、Activation、Flatten从 keras.layers 导入 Convolution2D、MaxPooling2D从 keras.utils 导入 np_utils从 sklearn.model_selection 导入 train_test_split将 numpy 导入为 np批量大小 = 32nb_classes = 100nb_epoch = 50data_augmentation = 真# 输入图像尺寸img_rows, img_cols = 32, 32# CIFAR10 图像是 RGB.img_channels = 3# 在训练集和测试集之间打乱和拆分的数据:(X, y), (X_test, y_test) = cifar100.load_data()X_train, X_val, y_train, y_val = train_test_split(X, y,test_size=0.20,random_state=42)# 打乱训练数据perm = np.arange(len(X_train))np.random.shuffle(烫发)X_train = X_train[烫发]y_train = y_train[烫发]print('X_train shape:', X_train.shape)打印(X_train.shape[0]，'训练样本')打印(X_val.shape[0]，'验证样本')打印(X_test.shape[0]，'测试样本')# 将类向量转换为二进制类矩阵.Y_train = np_utils.to_categorical(y_train, nb_classes)Y_test = np_utils.to_categorical(y_test, nb_classes)Y_val = np_utils.to_categorical(y_val, nb_classes)模型 = 顺序()model.add(Convolution2D(32, 3, 3, border_mode='same',input_shape=X_train.shape[1:]))模型.添加(激活('relu'))model.add(Convolution2D(32, 3, 3))模型.添加(激活('relu'))model.add(MaxPooling2D(pool_size=(2, 2)))模型.添加(辍学(0.25))model.add(Convolution2D(64, 3, 3, border_mode='same'))模型.添加(激活('relu'))model.add(Convolution2D(64, 3, 3))模型.添加(激活('relu'))model.add(MaxPooling2D(pool_size=(2, 2)))模型.添加(辍学(0.25))模型.添加(展平())模型.添加(密集(1024))模型.添加(激活('tanh'))模型.添加(辍学(0.5))模型.添加(密集(nb_classes))模型.添加(激活('softmax'))model.compile(loss='categorical_crossentropy',优化器='亚当'，指标=['准确度'])X_train = X_train.astype('float32')X_val = X_val.astype('float32')X_test = X_test.astype('float32')X_train/= 255X_val/= 255X_test/= 255如果不是 data_augmentation:print('不使用数据增强.')模型拟合(X_train，Y_train，批量大小=批量大小，nb_epoch=nb_epoch，验证数据=(X_val，y_val)，洗牌=真)别的:print('使用实时数据增强.')# 这将进行预处理和实时数据增强:datagen = ImageDataGenerator(featurewise_center=False, # 将数据集的输入均值设置为 0samplewise_center=False, # 设置每个样本均值为0featurewise_std_normalization=False, # 输入除以数据集的标准samplewise_std_normalization=False, # 每个输入除以它的标准zca_whitening=False, # 应用 ZCA 白化rotation_range=0, # 在范围内随机旋转图像(度数，0 到 180)width_shift_range=0.1, # 随机水平移动图像(总宽度的一部分)height_shift_range=0.1, # 随机垂直移动图像(总高度的一部分)horizontal_flip=True, # 随机翻转图片Vertical_flip=False) # 随机翻转图片# 计算特征归一化所需的数量#(如果应用 ZCA 白化，则为标准、均值和主成分).datagen.fit(X_train)# 在 datagen.flow() 生成的批次上拟合模型.model.fit_generator(datagen.flow(X_train, Y_train,批量大小=批量大小)，samples_per_epoch=X_train.shape[0],nb_epoch=nb_epoch，验证数据=(X_val，Y_val))模型.save('cifar100.h5')

可视化代码

#!/usr/bin/env python"""分析一个 cifar100 keras 模型."""从 keras.models 导入 load_model从 keras.datasets 导入 cifar100从 sklearn.model_selection 导入 train_test_split将 numpy 导入为 np导入json导入 io导入 matplotlib.pyplot 作为 plt尝试:to_unicode = unicode除了名称错误:to_unicode = strn_classes = 100def plot_cm(cm, zero_diagonal=False):"""绘制混淆矩阵."""n = 长度(厘米)大小 = int(n/4.)fig = plt.figure(figsize=(size, size), dpi=80, )plt.clf()ax = fig.add_subplot(111)ax.set_aspect(1)res = ax.imshow(np.array(cm), cmap=plt.cm.viridis,插值='最近')宽度，高度 = cm.shapefig.colorbar(res)plt.savefig('confusion_matrix.png', format='png')# 加载模型模型 = load_model('cifar100.h5')# 加载验证数据(X, y), (X_test, y_test) = cifar100.load_data()X_train, X_val, y_train, y_val = train_test_split(X, y,test_size=0.20,random_state=42)# 计算混淆矩阵y_val_i = y_val.flatten()y_val_pred = model.predict(X_val)y_val_pred_i = y_val_pred.argmax(1)cm = np.zeros((n_classes, n_classes), dtype=np.int)对于 i, j in zip(y_val_i, y_val_pred_i):厘米[i][j] += 1acc = sum([cm[i][i] for i in range(100)])/float(cm.sum())print("验证准确度:%0.4f" % acc)# 创建情节情节_厘米(厘米)# 序列化混淆矩阵使用 io.open('cm.json', 'w', encoding='utf8') 作为输出文件:str_ = json.dumps(cm.tolist(),缩进=4，sort_keys=真，分隔符=(',', ':'), ensure_ascii=False)outfile.write(to_unicode(str_))

红鲱鱼

tanh

我已经用 relu 替换了 tanh.

另请注意，这里的验证准确率仅为 3.44%.

Dropout + tanh + 边框模式

移除 dropout，用 relu 替换 tanh，将边框模式设置为到处相同:

与 keras 训练代码相比，可视化代码的准确率仍然低得多(这次为 8.50%).

问答一个

以下是评论摘要:

数据均匀分布在各个类中.所以这两个类没有过度训练".
使用了数据增强，但如果没有数据增强，问题仍然存在.
可视化不是问题.

解决方案

如果您在训练和验证期间获得了良好的准确性，但在测试时却没有，请确保在两种情况下对数据集进行完全相同的预处理.你在训练时可以看到:

X_train/= 255X_val/= 255X_test/= 255

但是在预测混淆矩阵时没有这样的代码.添加到测试中:

X_val/= 255.

给出以下漂亮的混淆矩阵:

I am currently trying to get a decent score (> 40% accuracy) with Keras on CIFAR 100. However, I'm experiencing a weird behaviour of a CNN model: It tends to predict some classes (2 - 5) much more often than others:

The pixel at position (i, j) contains the count how many elements of the validation set from class i were predicted to be of class j. Thus the diagonal contains the correct classifications, everything else is an error. The two vertical bars indicate that the model often predicts those classes, although it is not the case.

CIFAR 100 is perfectly balanced: All 100 classes have 500 training samples.

Why does the model tend to predict some classes MUCH more often than other classes? How can this be fixed?

The code

Running this takes a while.

#!/usr/bin/env python

from __future__ import print_function
from keras.datasets import cifar100
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from sklearn.model_selection import train_test_split
import numpy as np

batch_size = 32
nb_classes = 100
nb_epoch = 50
data_augmentation = True

# input image dimensions
img_rows, img_cols = 32, 32
# The CIFAR10 images are RGB.
img_channels = 3

# The data, shuffled and split between train and test sets:
(X, y), (X_test, y_test) = cifar100.load_data()
X_train, X_val, y_train, y_val = train_test_split(X, y,
                                                  test_size=0.20,
                                                  random_state=42)

# Shuffle training data
perm = np.arange(len(X_train))
np.random.shuffle(perm)
X_train = X_train[perm]
y_train = y_train[perm]

print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_val.shape[0], 'validation samples')
print(X_test.shape[0], 'test samples')

# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
Y_val = np_utils.to_categorical(y_val, nb_classes)

model = Sequential()

model.add(Convolution2D(32, 3, 3, border_mode='same',
                        input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='same'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_val /= 255
X_test /= 255

if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(X_train, Y_train,
              batch_size=batch_size,
              nb_epoch=nb_epoch,
              validation_data=(X_val, y_val),
              shuffle=True)
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    # Compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(X_train)

    # Fit the model on the batches generated by datagen.flow().
    model.fit_generator(datagen.flow(X_train, Y_train,
                                     batch_size=batch_size),
                        samples_per_epoch=X_train.shape[0],
                        nb_epoch=nb_epoch,
                        validation_data=(X_val, Y_val))
    model.save('cifar100.h5')

Visualization code

#!/usr/bin/env python


"""Analyze a cifar100 keras model."""

from keras.models import load_model
from keras.datasets import cifar100
from sklearn.model_selection import train_test_split
import numpy as np
import json
import io
import matplotlib.pyplot as plt
try:
    to_unicode = unicode
except NameError:
    to_unicode = str

n_classes = 100


def plot_cm(cm, zero_diagonal=False):
    """Plot a confusion matrix."""
    n = len(cm)
    size = int(n / 4.)
    fig = plt.figure(figsize=(size, size), dpi=80, )
    plt.clf()
    ax = fig.add_subplot(111)
    ax.set_aspect(1)
    res = ax.imshow(np.array(cm), cmap=plt.cm.viridis,
                    interpolation='nearest')
    width, height = cm.shape
    fig.colorbar(res)
    plt.savefig('confusion_matrix.png', format='png')

# Load model
model = load_model('cifar100.h5')

# Load validation data
(X, y), (X_test, y_test) = cifar100.load_data()

X_train, X_val, y_train, y_val = train_test_split(X, y,
                                                  test_size=0.20,
                                                  random_state=42)

# Calculate confusion matrix
y_val_i = y_val.flatten()
y_val_pred = model.predict(X_val)
y_val_pred_i = y_val_pred.argmax(1)
cm = np.zeros((n_classes, n_classes), dtype=np.int)
for i, j in zip(y_val_i, y_val_pred_i):
    cm[i][j] += 1

acc = sum([cm[i][i] for i in range(100)]) / float(cm.sum())
print("Validation accuracy: %0.4f" % acc)

# Create plot
plot_cm(cm)

# Serialize confusion matrix
with io.open('cm.json', 'w', encoding='utf8') as outfile:
    str_ = json.dumps(cm.tolist(),
                      indent=4, sort_keys=True,
                      separators=(',', ':'), ensure_ascii=False)
    outfile.write(to_unicode(str_))

Red herrings

tanh

I've replaced tanh by relu. The history csv looks ok, but the visualization has the same problem:

Please also note that the validation accuracy here is only 3.44%.

Dropout + tanh + border mode

Removing dropout, replacing tanh by relu, setting border mode to same everywhere: history csv

The visualization code still gives a much lower accuracy (8.50% this time) than the keras training code.

Q & A

The following is a summary of the comments:

The data is evenly distributed over the classes. So there is no "over training" of those two classes.
Data augmentation is used, but without data augmentation the problem persists.
The visualization is not the problem.

解决方案

If you get good accuracy during training and validation, but not when testing, make sure you do exactly the same preprocessing on your dataset in both cases. Here you have when training:

X_train /= 255
X_val /= 255
X_test /= 255

But no such code when predicting for your confusion matrix. Adding to testing:

X_val /=  255.

Gives the following nice looking confusion matrix:

这篇关于为什么我的 CIFAR 100 CNN 模型主要预测两个类别?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么我的 CIFAR 100 CNN 模型主要预测两个类别? [英] Why does my CIFAR 100 CNN model mainly predict two classes?

问题描述

代码

可视化代码

红鲱鱼

tanh

Dropout + tanh + 边框模式

问答一个

The code

Visualization code

Red herrings

tanh

Dropout + tanh + border mode

Q & A

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么我的 CIFAR 100 CNN 模型主要预测两个类别? [英] Why does my CIFAR 100 CNN model mainly predict two classes?

问题描述

代码

可视化代码

红鲱鱼

tanh

Dropout + tanh + 边框模式

问答一个

The code

Visualization code

Red herrings

tanh

Dropout + tanh + border mode

Q & A

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭