如何使用 Tensorflow 中的测试集加载和评估 CNN? [英] How to load and evaluate a CNN using a test set in Tensorflow?

查看:32
本文介绍了如何使用 Tensorflow 中的测试集加载和评估 CNN?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在一组图像上训练 CNN.有 2 个文件夹:training_set 和 test_set,每个文件夹包含 2 个类.它们看起来像这样:

I'm trying to train a CNN on a set of images. There are 2 folders: training_set and test_set, each containing 2 classes. They look like this:

training_set/
    classA/
        img1.png
        img2.png
        ...
    classB/
        img1.png
        img2.png
        ...

test_set/
    classA/
        img1.png
        img2.png
        ...
    classB/
        img1.png
        img2.png
        ...

代码如下所示,其中训练集被拆分为训练集和验证集:

Code looks like this, where the training set is split into a training and validation set:

import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.python.client import device_lib 
import numpy as np
import matplotlib.pyplot as plt
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
print(device_lib.list_local_devices())

# Set image properties
img_height = 369
img_width = 496
batch_size = 32

# Import data set from directory
train_images = tf.keras.preprocessing.image_dataset_from_directory(
    "path_to_training_set",
    labels='inferred',
    label_mode="binary", # not sure about this one though, as the classes are not called '0' and '1'
    class_names = ['classA', 'classB'],
    color_mode =  'rgb',
    batch_size = batch_size,
    image_size = (img_height, img_width),
    shuffle = True,
    seed = 123,
    validation_split = 0.2,
    subset = "training"
)

val_images = tf.keras.preprocessing.image_dataset_from_directory(
    "path_to_training_set",
    labels='inferred',
    label_mode="binary", # not sure about this one though, as the classes are not called '0' and '1'
    class_names = ['classA', 'classB'],
    color_mode =  'rgb',
    batch_size = batch_size,
    image_size = (img_height, img_width),
    shuffle = True,
    seed = 123,
    validation_split = 0.2,
    subset = "validation"
)

那么:

from matplotlib import pyplot

img_height = 369
img_width = 496
epochs = 25

model = tf.keras.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(img_height, img_width, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
# Since we have two classes:
model.add(layers.Dense(1, activation='sigmoid'))

# BinaryCrossentropy because there are 2 classes 
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=optimizer, loss=tf.keras.losses.BinaryCrossentropy(from_logits=False), metrics=['accuracy'])

# Feed the model
history = model.fit(train_images, epochs=epochs, batch_size=32, verbose=1, validation_data=val_images)

# Plot
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

现在模型已经过训练,它显示了训练和验证准确率和损失的图.我尝试使用以下方法加载我的测试集:

Now that the model is trained, it shows plots of the training and validation accuracy and loss. I try to load my test set using:

test_images = tf.keras.preprocessing.image_dataset_from_directory(
    "path_to_test_set",
    labels='inferred',
    label_mode="binary",
    class_names = ['classA', 'classB'],
    color_mode =  'rgb',
    batch_size = batch_size, # not really applicable as I want to use the whole set?
    image_size = (img_height, img_width),
    shuffle = True,
    seed = 123,
    validation_split = None
)

但这是正确的方法吗?我如何处理batch_size?我想我会用我的测试集评估模型:

But is this the correct way? How do I deal with the batch_size? I think I'd evaluate the model with my test set using:

test_loss, test_acc = model.evaluate(test_images, verbose=2)
print('\nTest accuracy:', test_acc)

但我认为这还不够,因为我想要准确度、精确度、召回率和 F1 分数.我什至不确定这里发生了正确的事情(测试集是如何加载的).

but I don't think this is sufficient as I'd like the accuracy, precision, recall and F1-score. I'm also not even sure the right thing is happening here (with how the test set is loaded).

所以基本上:我如何加载我的测试集并计算准确率、准确率、召回率和 F1 分数?

So basically: How do I load my test set and calculate accuracy, precision, recall and F1-score?

推荐答案

您需要对数据进行迭代,然后才能收集预测和真实类.

You need to iterate over the data, then you can collect predictions and true classes.

predicted_probs = np.array([])
true_classes =  np.array([])

for images, labels in test_images:
  predicted_probs = np.concatenate([predicted_probs,
                       model(images)])
  true_classes = np.concatenate([true_classes, labels.numpy()])

由于它们是 sigmoid 输出,因此您需要将它们转换为具有阈值的类,即此处为 0.5:

Since they are sigmoid outputs, you need to transform them into classes with a threshold, i.e 0.5 here:

predicted_classes = [1 * (x[0]>=0.5) for x in predicted_probs]

之后你可以得到混淆矩阵等:

After that you can get the confusion matrix etc:

conf_matrix = tf.math.confusion_matrix(true_classes, predicted_classes)

这篇关于如何使用 Tensorflow 中的测试集加载和评估 CNN?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆