如何使用 Tensorflow 中的测试集加载和评估 CNN? [英] How to load and evaluate a CNN using a test set in Tensorflow?
问题描述
我正在尝试在一组图像上训练 CNN.有 2 个文件夹:training_set 和 test_set,每个文件夹包含 2 个类.它们看起来像这样:
I'm trying to train a CNN on a set of images. There are 2 folders: training_set and test_set, each containing 2 classes. They look like this:
training_set/
classA/
img1.png
img2.png
...
classB/
img1.png
img2.png
...
test_set/
classA/
img1.png
img2.png
...
classB/
img1.png
img2.png
...
代码如下所示,其中训练集被拆分为训练集和验证集:
Code looks like this, where the training set is split into a training and validation set:
import os
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.python.client import device_lib
import numpy as np
import matplotlib.pyplot as plt
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
print(device_lib.list_local_devices())
# Set image properties
img_height = 369
img_width = 496
batch_size = 32
# Import data set from directory
train_images = tf.keras.preprocessing.image_dataset_from_directory(
"path_to_training_set",
labels='inferred',
label_mode="binary", # not sure about this one though, as the classes are not called '0' and '1'
class_names = ['classA', 'classB'],
color_mode = 'rgb',
batch_size = batch_size,
image_size = (img_height, img_width),
shuffle = True,
seed = 123,
validation_split = 0.2,
subset = "training"
)
val_images = tf.keras.preprocessing.image_dataset_from_directory(
"path_to_training_set",
labels='inferred',
label_mode="binary", # not sure about this one though, as the classes are not called '0' and '1'
class_names = ['classA', 'classB'],
color_mode = 'rgb',
batch_size = batch_size,
image_size = (img_height, img_width),
shuffle = True,
seed = 123,
validation_split = 0.2,
subset = "validation"
)
那么:
from matplotlib import pyplot
img_height = 369
img_width = 496
epochs = 25
model = tf.keras.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(img_height, img_width, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
# Since we have two classes:
model.add(layers.Dense(1, activation='sigmoid'))
# BinaryCrossentropy because there are 2 classes
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(optimizer=optimizer, loss=tf.keras.losses.BinaryCrossentropy(from_logits=False), metrics=['accuracy'])
# Feed the model
history = model.fit(train_images, epochs=epochs, batch_size=32, verbose=1, validation_data=val_images)
# Plot
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
现在模型已经过训练,它显示了训练和验证准确率和损失的图.我尝试使用以下方法加载我的测试集:
Now that the model is trained, it shows plots of the training and validation accuracy and loss. I try to load my test set using:
test_images = tf.keras.preprocessing.image_dataset_from_directory(
"path_to_test_set",
labels='inferred',
label_mode="binary",
class_names = ['classA', 'classB'],
color_mode = 'rgb',
batch_size = batch_size, # not really applicable as I want to use the whole set?
image_size = (img_height, img_width),
shuffle = True,
seed = 123,
validation_split = None
)
但这是正确的方法吗?我如何处理batch_size?我想我会用我的测试集评估模型:
But is this the correct way? How do I deal with the batch_size? I think I'd evaluate the model with my test set using:
test_loss, test_acc = model.evaluate(test_images, verbose=2)
print('\nTest accuracy:', test_acc)
但我认为这还不够,因为我想要准确度、精确度、召回率和 F1 分数.我什至不确定这里发生了正确的事情(测试集是如何加载的).
but I don't think this is sufficient as I'd like the accuracy, precision, recall and F1-score. I'm also not even sure the right thing is happening here (with how the test set is loaded).
所以基本上:我如何加载我的测试集并计算准确率、准确率、召回率和 F1 分数?
So basically: How do I load my test set and calculate accuracy, precision, recall and F1-score?
推荐答案
您需要对数据进行迭代,然后才能收集预测和真实类.
You need to iterate over the data, then you can collect predictions and true classes.
predicted_probs = np.array([])
true_classes = np.array([])
for images, labels in test_images:
predicted_probs = np.concatenate([predicted_probs,
model(images)])
true_classes = np.concatenate([true_classes, labels.numpy()])
由于它们是 sigmoid 输出,因此您需要将它们转换为具有阈值的类,即此处为 0.5:
Since they are sigmoid outputs, you need to transform them into classes with a threshold, i.e 0.5 here:
predicted_classes = [1 * (x[0]>=0.5) for x in predicted_probs]
之后你可以得到混淆矩阵等:
After that you can get the confusion matrix etc:
conf_matrix = tf.math.confusion_matrix(true_classes, predicted_classes)
这篇关于如何使用 Tensorflow 中的测试集加载和评估 CNN?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!