如何为此CNN使用K折交叉验证? [英] How to use K-Fold Cross Validation For This CNN?

查看:164
本文介绍了如何为此CNN使用K折交叉验证?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经尝试为我的二进制图像分类器实现K折交叉验证,但是由于我一直陷于整个数据处理方面,所以我已经苦苦挣扎了一段时间.在尝试使用K Fold之前,我在下面包括了我的代码(这很长而且很乱-致歉),因为它出现了严重错误.任何建议或支持,将不胜感激.我认为在这里使用K折是正确的方法,但是如果不正确,请告诉我.非常感谢!

I have tried to implement K Fold Cross Validation for my binary image classifier, but I have been struggling for a while as I have been stuck with the whole data processing side of things. I have included my code below (it is quite long and messy - apologies) before my attempts at the K Fold as it went horribly wrong. Any suggestions or support would be greatly appreciated. I believe that using a K Fold is the right approach here, but if not, please let me know. Thank you so much!

我想知道如何重新格式化数据以创建单独的折叠,因为那里的每个教程几乎都使用.csv文件;但是,我只是有两个包含图像的文件夹,这些文件夹被分为两个单独的类别(用于训练数据)或一个单独的类别(用于测试数据).

I was wondering how I can reformat my data to create the separate folds as pretty much every tutorial out there uses a .csv file; however, I simply have two different folders containing images, either ordered into two separate categories (for the training data) or just one singular category (for the test data).

from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Dropout
from keras.regularizers import l2
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import EarlyStopping
import numpy as np
import matplotlib.pyplot as plt

classifier = Sequential()
classifier.add(Conv2D(32, (3 , 3), input_shape = (256, 256, 3), activation = 'relu', kernel_regularizer=l2(0.01)))
classifier.add(MaxPooling2D(pool_size=(2,2)))
classifier.add(Flatten())
classifier.add(Dense(units = 128, activation='relu'))
classifier.add(Dropout(0.5))
classifier.add(Dense(units=1, activation='sigmoid'))
classifier.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])

train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, validation_split=0.2)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
    'train',
    target_size=(256, 256),
    batch_size=32,
    class_mode='binary',
    subset='training') # set as training data

validation_generator = train_datagen.flow_from_directory(
    'train', # same directory as training data
    target_size=(256, 256),
    batch_size=32,
    class_mode='binary',
    subset='validation')
test_set = test_datagen.flow_from_directory('test', target_size = (256,256), batch_size=10, class_mode='binary')

history = classifier.fit_generator(train_generator, steps_per_epoch=40, epochs=100, validation_data=validation_generator)
classifier.save('50epochmodel')

test_images = np.array(list(next(test_set)[:1]))[0]
probabilities = classifier.predict(test_images)

推荐答案

要获得更大的灵活性,您可以对文件使用简单的加载功能,而不必使用Keras生成器.然后,您可以遍历文件列表并针对其余折叠进行测试.

For more flexibility you can use a simple loading function for files, rather than using a Keras generator. Then, you can iterate through a list of files and test against the remaining fold.

import os
os.chdir(r'catsanddogs')
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
from collections import deque
from glob2 import glob
import numpy as np

files = glob('*\\*\\*.jpg')
files = files[:-(len(files)%3)] # dataset is now divisible by 3 

indices = np.random.permutation(len(files)).reshape(3, -1)

imsize = 64


def load(file_path):
    img = tf.io.read_file(file_path)
    img = tf.image.decode_png(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.image.resize(img, size=(imsize, imsize))
    label = tf.strings.split(file_path, os.sep)[1]
    label = tf.cast(tf.equal(label, 'dogs'), tf.int32)
    return img, label

accuracies_on_test_set = {}

for i in range(len(indices)):
    d = deque(np.array(files)[indices].tolist())
    d.rotate(-i)
    train1, train2, test1 = d
    train_ds = tf.data.Dataset.from_tensor_slices(train1 + train2).\
        shuffle(len(train1) + len(train2)).map(load).batch(4)
    test_ds = tf.data.Dataset.from_tensor_slices(test1).\
        shuffle(len(test1)).map(load).batch(4)

    classifier = Sequential()
    classifier.add(Conv2D(8, (3, 3), input_shape=(imsize, imsize, 3), activation='relu'))
    classifier.add(MaxPooling2D(pool_size=(2, 2)))
    classifier.add(Flatten())
    classifier.add(Dense(units=32, activation='relu'))
    classifier.add(Dropout(0.5))
    classifier.add(Dense(units=1, activation='sigmoid'))
    classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

    classifier.fit(train_ds, validation_data=test_ds, epochs=2, verbose=0)
    loss, accuracy = classifier.evaluate(test_ds, verbose=0)
    accuracies_on_test_set[f'epoch_{i + 1}_accuracy'] = accuracy

print(accuracies_on_test_set)

{'epoch_1_accuracy': 0.8235, 'epoch_2_accuracy': 0.7765, 'epoch_3_accuracy': 0.736}

以下是数据集的轮换:

from collections import deque

groups = ['group1', 'group2', 'group3']

for i in range(3):
    d = deque(groups)
    d.rotate(-i)
    print(list(d))

['group1', 'group2', 'group3']
['group2', 'group3', 'group1']
['group3', 'group1', 'group2']

它们全部轮流作为最后一个,随后作为对所有其他对象的测试集:

They all take turns being the last, which is subsequently taken as the test set against all the others:

train1, train2, test1 = d

这篇关于如何为此CNN使用K折交叉验证?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆