Keras:分类报告的准确性在不同模型之间是不同的.预测多类的准确性 [英] Keras: Classification report accuracy is different between model.predict accuracy for multiclass

查看:82
本文介绍了Keras:分类报告的准确性在不同模型之间是不同的.预测多类的准确性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Colab链接位于此处:

Colab link is here:

数据导入之前是

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    main_folder,
    validation_split=0.1,
    subset="training",
    label_mode='categorical',
    seed=123,
    image_size=(dim, dim))

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    main_folder,
    validation_split=0.1,
    subset="validation",
    label_mode='categorical',
    seed=123,
    image_size=(dim, dim))

该模型是通过以下方式训练的

The model is trained the following way

model = tf.keras.models.Sequential([
    tf.keras.layers.experimental.preprocessing.Rescaling(1. / 255),
    ...
    tf.keras.layers.Dense(2, activation='softmax')
])

model.compile(optimizer="adam", loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['accuracy'])

我正在努力获取正确的预测类别和正确的 true_categories 来使分类报告生效:

I am struggling with getting the right predicted categories and right true_categories to get the classification report to work:

y_pred = model.predict(val_ds, batch_size=1)
predicted_categories = np.argmax(y_pred, axis=1)

true_categories = tf.concat([y for x, y in val_ds], axis=0).numpy()
true_categories_argmax = np.argmax(true_categories, axis=1)

print(classification_report(true_categories_argmax, predicted_categories))

当前时间段的输出与分类报告相矛盾

At the moment the output of the epoch is contradicting the classification report

Epoch 22/75
144/144 [==============================] - 7s 48ms/step - loss: 0.0611 - accuracy: 0.9776 - val_loss: 0.0768 - val_accuracy: 0.9765

模型上的验证集返回

model.evaluate(val_ds)

[==============================] - 0s 16ms/step - loss: 0.0696 - accuracy: 0.9784
[0.06963862478733063, 0.9784313440322876]

分类报告非常不同:

          precision    recall  f1-score   support
     0.0       0.42      0.44      0.43       221
     1.0       0.56      0.54      0.55       289
    accuracy                           0.49       510
   macro avg       0.49      0.49      0.49       510
weighted avg       0.50      0.49      0.50       510

类似的问题此处此处此处

Similiar questions here, here, here, here, here with no answers to this issue.

推荐答案

您设置了 label_mode ='categorical',则这是一个多类分类,您需要在最后一个密集层中使用 softmax 激活.因为softmax强制输出和等于1.您可以将它们解释为概率.使用 sigmoid ,将无法找到主要类.它可以不受限制地分配任何值.

You set label_mode='categorical' then this is a multi-class classification and you need to use softmax activation in your last dense layer. Because softmax force the outputs sum to be equal to 1. You can kinda interpret them as probabilities. With sigmoid it will not be possible to find the dominant class. It can assign any values without restriction.

我模型的最后一层:密集(5,激活='softmax')

我的模型丢失: loss = tf.keras.losses.CategoricalCrossentropy(),与您的相同.在这种情况下,标签是一键编码的.

My model's loss: loss=tf.keras.losses.CategoricalCrossentropy(), same as yours. Labels are one-hot-encoded in this case.

说明:我出于演示目的使用了5类分类,但是它遵循相同的逻辑.

Explanation: I used a 5 class classification for demo purposes, but it follows the same logic.

y_pred = model.predict(val_ds)

y_pred[:2]
>>> array([[0.28257513, 0.4343998 , 0.18222839, 0.04164065, 0.05915598],
       [0.36404607, 0.08850227, 0.15335019, 0.21602921, 0.17807229]],
      dtype=float32)

这表明每个类的概率,例如,第一个示例的概率为43%属于类2.您需要使用 argmax 来查找类索引.

This incidates each classes probabilities, for example first example has a probability of 43% being belong to class 2. You need to use argmax to find class index.

predicted_categories = np.argmax(y_pred, axis = 1)
predicted_categories[:2]

array([1, 0])

我们现在有了预测的班级.现在需要获取真实的类.

We now have the predicted classes. Now need to obtain true classes.

true_categories = tf.concat([y for x, y in val_ds], axis = 0).numpy() # convert to np array

    true_categories[:2]
>>> array([[1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1.]], dtype=float32)

如果将其输入分类报告,则会得到以下信息:

If you feed this into classification report, you will get following:

ValueError: Classification metrics can't handle a mix of multilabel-indicator and multiclass targets

我们还需要做:

    true_categories_argmax = np.argmax(true_categories, axis = 1)
    true_categories_argmax[:2]
>>> array([0, 4])

现在可以进行比较了.

print(classification_report(true_categories_argmax, predicted_categories))

这应该产生预期的结果:

That should produce the expected result:

      precision    recall  f1-score   support

   0       0.55      0.43      0.48       129
   1       0.53      0.83      0.64       176
   2       0.48      0.56      0.52       120
   3       0.75      0.72      0.73       152
   4       0.66      0.31      0.42       157

编辑:随着 tf.keras.preprocessing.image_dataset_from_directory 设置 shuffle = True ,类可能会被改组.对于 val_ds ,请尝试设置 shuffle = False .像这样:

Classes might get shuffled as tf.keras.preprocessing.image_dataset_from_directory sets shuffle = True. For val_ds try to set shuffle = False. Like this:

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    main_folder,
    validation_split=0.1,
    subset="validation",
    shuffle = False,
    label_mode='categorical',
    seed=123,
    image_size=(dim, dim))

Edit2 :这是我想到的:

prediction_classes = np.array([])
true_classes =  np.array([])

for x, y in val_ds:
  prediction_classes = np.concatenate([prediction_classes,
                       np.argmax(model.predict(x), axis = -1)])
  true_classes = np.concatenate([true_classes, np.argmax(y.numpy(), axis=-1)])

分类报告:

print(classification_report(true_classes, prediction_classes))

              precision    recall  f1-score   support

         0.0       0.74      0.81      0.77      1162
         1.0       0.80      0.72      0.75      1179

    accuracy                           0.77      2341
   macro avg       0.77      0.77      0.76      2341
weighted avg       0.77      0.77      0.76      2341

这篇关于Keras:分类报告的准确性在不同模型之间是不同的.预测多类的准确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆