在Keras中检查验证结果仅显示50%正确.显然是随机的 [英] Checking validation results in Keras shows only 50% correct. Clearly random

查看:87
本文介绍了在Keras中检查验证结果仅显示50%正确.显然是随机的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力解决一个看似简单的问题.我不知道如何将输入图像与模型产生的结果概率进行匹配.

I'm struggling with a, seemingly simple, problem. I can't figure out how to match my input images to the resulting probabilities produced by my model.

我的模型的训练和验证(香草VGG16,重新训练了2个班级,狗和猫)效果很好,使我的验证精度接近97%,但是当我运行检查以查看得到的结果时,对与错,我只会得到随机结果.

Training and Validation of my model (Vanilla VGG16, re-trainined for 2 classes, dogs and cats) are going fine, getting me close to 97% validation accuracy, but when I run the check to see what I got right and what I got wrong I only get random results.

找到1087个正确的标签(53.08%)

我很确定这与ImageDataGenerator有关,尽管我确实设置了shuffle = false

I am pretty sure it has something to do with the ImageDataGenerator which produces random batches on my validation images, although I DO set shuffle = false

在运行它们之前,我只是保存了生成器的文件名和类,并且我确信我的文件名和类的索引与我的概率的输出相同.

I just save the filenames and classes of my generator before I run them and I ASSUME that the index of my filenames and classes is the same as the output of my probabilities.

这是我的设置(香草VGG16,最后一层被替换为与猫和狗的2类匹配)

Here's my setup (Vanilla VGG16, with last layer replaced to match 2 categories for cats and dogs)

new_model.summary()


Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
Binary_predictions (Dense)   (None, 2)                 8194      
=================================================================
Total params: 134,268,738
Trainable params: 8,194
Non-trainable params: 134,260,544
_________________________________________________________________


batch_size=16
epochs=3
learning_rate=0.01

这是生成器的定义,用于培训和验证.此时我还没有包括数据增强部分.

This is the definition of the generators, for training and validation. I did not yet include the data augmentation part at this point.

train_datagen = ImageDataGenerator()
validation_datagen = ImageDataGenerator()
test_datagen = ImageDataGenerator()

train_generator = train_datagen.flow_from_directory(
    train_path,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical')
train_filenames = train_generator.filenames
train_samples = len(train_filenames)

validation_generator = validation_datagen.flow_from_directory(
    valid_path,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical',
    shuffle = False) #Need this to be false, so I can extract the correct classes and filenames in order that that are predicted
validation_filenames = validation_generator.filenames
validation_samples = len(validation_filenames)

微调模型效果很好

#Fine-tune the model
#DOC: fit_generator(generator, steps_per_epoch, epochs=1, verbose=1, callbacks=None,
#              validation_data=None, validation_steps=None, class_weight=None,
#              max_queue_size=10, workers=1, use_multiprocessing=False, initial_epoch=0)

new_model.fit_generator(
    train_generator,
    steps_per_epoch=train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=validation_samples // batch_size)

Epoch 1/3
1434/1434 [==============================] - 146s - loss: 0.5456 - acc: 0.9653 - val_loss: 0.5043 - val_acc: 0.9678
Epoch 2/3
1434/1434 [==============================] - 148s - loss: 0.5312 - acc: 0.9665 - val_loss: 0.4293 - val_acc: 0.9722
Epoch 3/3
1434/1434 [==============================] - 148s - loss: 0.5332 - acc: 0.9665 - val_loss: 0.4329 - val_acc: 0.9731

验证数据的提取也是如此

As is the extraction of the validation data

#We need the probabilities/scores for the validation set
#DOC: predict_generator(generator, steps, max_queue_size=10, workers=1,
#                       use_multiprocessing=False, verbose=0)
probs = new_model.predict_generator(
            validation_generator,
            steps=validation_samples // batch_size,
            verbose = 1)

#Extracting the probabilities and labels
our_predictions = probs[:,0]
our_labels = np.round(1-our_predictions)
expected_labels = validation_generator.classes

现在,当我通过比较预期的标签和计算出的标签来计算验证集的成功率时,我得到的东西可疑地接近随机值:

Now, when I calculate the success of my validation set by comparing the expected labels and the calculated labels, I get something that is suspiciously close to random:

correct = np.where(our_labels==expected_labels)[0]
print("Found {:3d} correct labels ({:.2f}%)".format(len(correct),
       100*len(correct)/len(our_predictions)))

找到1087个正确的标签(53.08%)

显然这是不正确的.

我怀疑这与生成器的随机性有关,但是我将shuffle设置为False.

I suspect this is something to do with the randomness of the Generators, but I set shuffle = False.

这段代码是由伟大的杰里米·霍华德(Jeremy Howard)直接从Fast.ai课程复制而来的,但是我再也无法使用了..

This code was DIRECTLY copied from the Fast.ai course by the great Jeremy Howard, but I can't get it to work anymore..

我在Anaconda下的Python 3.5上使用Keras 2.0.8和TensorFlow 1.3后端...

I am using Keras 2.0.8 and TensorFlow 1.3 backend on Python 3.5 under Anaconda...

请帮助我保持理智!

推荐答案

我以前遇到过类似的问题,我认为predict_generator()不友好,因此我编写了一个函数来测试数据集. 这是我的代码段:

I met a similar problem before, I think predict_generator() is not friendly, so I write a function to test the data set. Here is my code snippet:

from PIL import Image
import numpy as np
import json

def get_img_result(img_path):
image = Image.open(img_path)
image.load()
image = image.resize((img_width, img_height))
if image.mode is not 'RGB':
    image = image.convert('RGB')
array = np.asarray(image, dtype='int32')
array = array / 255
array = np.asarray([array])
result = new_model.predict(array)
print(result)
return result

# path: the root folder of the validation data set. validation->cat->kitty.jpg
def validate(path):
result_list = []
right_count = 0
wrong_count = 0
categories = os.listdir(path)
for i in range(len(categories)):
    images = os.listdir(os.path.join(path, categories[i]))
    for image in images:
        result = get_img_result(os.path.join(path, categories[i], image))[0]
        if result[i] != max(result):
            result_list.append({'image': image, 'category': categories[i], 'score': result.tolist(), 'right': 0})
            wrong_count = wrong_count + 1
        else:
            result_list.append({'image': image, 'category': categories[i], 'score': result.tolist(), 'right': 1})
            right_count = right_count + 1
json_string = json.dumps(result_list)
with open('result.json', 'w') as f:
    f.write(json_string)
print('right count : {0} \n wrong count : {1} \n accuracy : {2}'.format(right_count, wrong_count,
                                                                        (right_count) / (
                                                                            right_count + wrong_count)))

我使用PIL像Keras一样将图像转换为numpy数组,我测试了所有图像并将结果保存到json文件中.

I use PIL convert image to numpy array as Keras do, I test all images and save the result into a json file.

希望有帮助.

这篇关于在Keras中检查验证结果仅显示50%正确.显然是随机的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆