为什么fit_generator的精度与Keras中的valuate_generator的精度不同? [英] Why is accuracy from fit_generator different to that from evaluate_generator in Keras?
问题描述
我做什么:
- 我正在用Keras
fit_generator()
训练预训练的CNN.这会在每个时期后产生评估指标(loss, acc, val_loss, val_acc
).训练模型后,我用evaluate_generator()
生成评估指标(loss, acc
).
- I am training a pre-trained CNN with Keras
fit_generator()
. This produces evaluation metrics (loss, acc, val_loss, val_acc
) after each epoch. After training the model, I produce evaluation metrics (loss, acc
) withevaluate_generator()
.
我期望的是
- 如果我训练模型一个纪元,我希望通过
fit_generator()
和evaluate_generator()
获得的度量是相同的.他们俩都应基于整个数据集得出指标.
- If I train the model for one epoch, I would expect that the metrics obtained with
fit_generator()
andevaluate_generator()
are the same. They both should derive the metrics based on the entire dataset.
我的观察结果
-
loss
和acc
与fit_generator()
和evaluate_generator()
不同:
- Both
loss
andacc
are different fromfit_generator()
andevaluate_generator()
:
我不了解的地方:
- 为什么
fit_generator()
中的精度为 与evaluate_generator()
不同
- Why the accuracy from
fit_generator()
is different to that fromevaluate_generator()
我的代码:
def generate_data(path, imagesize, nBatches):
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory\
(directory=path, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True) # whether to shuffle the data
return generator
[...]
def train_model(model, nBatches, nEpochs, trainGenerator, valGenerator, resultPath):
history = model.fit_generator(generator=trainGenerator,
steps_per_epoch=trainGenerator.samples//nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=None, # keras.callbacks.Callback instances to apply during training
validation_data=valGenerator, # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
validation_steps=
valGenerator.samples//nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=32, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=True, # whether to use process-based threading
shuffle=False, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
print("%s: Model trained." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
# Save model
modelPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelArchitecture.h5')
weightsPath = os.path.join(resultPath, datetime.now().strftime('%Y-%m-%d_%H-%M-%S') + '_modelWeights.h5')
model.save(modelPath)
model.save_weights(weightsPath)
print("%s: Model saved." % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
return history, model
[...]
def evaluate_model(model, generator):
score = model.evaluate_generator(generator=generator, # Generator yielding tuples
steps=
generator.samples//nBatches) # number of steps (batches of samples) to yield from generator before stopping
print("%s: Model evaluated:"
"\n\t\t\t\t\t\t Loss: %.3f"
"\n\t\t\t\t\t\t Accuracy: %.3f" %
(datetime.now().strftime('%Y-%m-%d_%H-%M-%S'),
score[0], score[1]))
[...]
def main():
# Create model
modelUntrained = create_model(imagesize, nBands, nClasses)
# Prepare training and validation data
trainGenerator = generate_data(imagePathTraining, imagesize, nBatches)
valGenerator = generate_data(imagePathValidation, imagesize, nBatches)
# Train and save model
history, modelTrained = train_model(modelUntrained, nBatches, nEpochs, trainGenerator, valGenerator, resultPath)
# Evaluate on validation data
print("%s: Model evaluation (valX, valY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
evaluate_model(modelTrained, valGenerator)
# Evaluate on training data
print("%s: Model evaluation (trainX, trainY):" % datetime.now().strftime('%Y-%m-%d_%H-%M-%S'))
evaluate_model(modelTrained, trainGenerator)
更新
我发现了一些报告此问题的网站:
I found some sites that report on this issue:
- Keras的批量标准化层已损坏
- 奇怪 训练条件下,损失函数在keras模型中的行为 卷积基础
- model.evaluate()在 来自训练过程中的一个训练数据
- 在历史记录和评估之间获得了不同的准确性
- ResNet:准确度为100%在训练中,但有33%的预测 相同数据的准确性
- The Batch Normalization layer of Keras is broken
- Strange behaviour of the loss function in keras model, with pretrained convolutional base
- model.evaluate() gives a different loss on training data from the one in training process
- Got different accuracy between history and evaluate
- ResNet: 100% accuracy during training, but 33% prediction accuracy with the same data
到目前为止,我一直尝试遵循他们提出的一些解决方案,但没有成功. acc
和loss
仍然与fit_generator()
和evaluate_generator()
有所不同,即使使用由同一生成器生成的完全相同的数据进行训练和验证也是如此.这是我尝试过的:
I tried following some of their suggested solutions without success so far. acc
and loss
are still different from fit_generator()
and evaluate_generator()
, even when using the exact same data generated with the same generator for training and validation. Here is what I tried:
- 为整个脚本静态地设置learning_phase,或者在将新层添加到预训练的层之前
K.set_learning_phase(0) # testing
K.set_learning_phase(1) # training
- 从预训练模型中解冻所有批次归一化层
for i in range(len(model.layers)):
if str.startswith(model.layers[i].name, 'bn'):
model.layers[i].trainable=True
- 不将辍学或批处理规范化添加为未训练的层
# Create pre-trained base model
basemodel = ResNet50(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
# Create new untrained layers
x = basemodel.output
x = GlobalAveragePooling2D()(x) # global spatial average pooling layer
x = Dense(1024, activation='relu')(x) # fully-connected layer
y = Dense(nClasses, activation='softmax')(x) # logistic layer making sure that probabilities sum up to 1
# Create model combining pre-trained base model and new untrained layers
model = Model(inputs=basemodel.input,
outputs=y)
# Freeze weights on pre-trained layers
for layer in basemodel.layers:
layer.trainable = False
# Define learning optimizer
learningRate = 0.01
optimizerSGD = optimizers.SGD(lr=learningRate, # learning rate.
momentum=0.9, # parameter that accelerates SGD in the relevant direction and dampens oscillations
decay=learningRate/nEpochs, # learning rate decay over each update
nesterov=True) # whether to apply Nesterov momentum
# Compile model
model.compile(optimizer=optimizerSGD, # stochastic gradient descent optimizer
loss='categorical_crossentropy', # objective function
metrics=['accuracy'], # metrics to be evaluated by the model during training and testing
loss_weights=None, # scalar coefficients to weight the loss contributions of different model outputs
sample_weight_mode=None, # sample-wise weights
weighted_metrics=None, # metrics to be evaluated and weighted by sample_weight or class_weight during training and testing
target_tensors=None) # tensor model's target, which will be fed with the target data during training
- 使用不同的预训练CNN作为基本模型( VGG19,InceptionV3,InceptionResNetV2,Xception )
- using different pre-trained CNNs as base model (VGG19, InceptionV3, InceptionResNetV2, Xception)
from keras.applications.vgg19 import VGG19
basemodel = VGG19(include_top=False, # exclude final pooling and fully connected layer in the original model
weights='imagenet', # pre-training on ImageNet
input_tensor=None, # optional tensor to use as image input for the model
input_shape=(imagesize, # shape tuple
imagesize,
nBands),
pooling=None, # output of the model will be the 4D tensor output of the last convolutional layer
classes=nClasses) # number of classes to classify images into
请让我知道我是否还缺少其他解决方案.
Please let me know if there are other solutions around that I am missing.
推荐答案
我现在设法拥有相同的评估指标.我更改了以下内容:
I now managed having the same evaluation metrics. I changed the following:
- 我按照@Anakin的建议在
flow_from_directory()
中设置了seed
.
def generate_data(path, imagesize, nBatches):
datagen = ImageDataGenerator(rescale=1./255)
generator = datagen.flow_from_directory(directory=path, # path to the target directory
target_size=(imagesize,imagesize), # dimensions to which all images found will be resize
color_mode='rgb', # whether the images will be converted to have 1, 3, or 4 channels
classes=None, # optional list of class subdirectories
class_mode='categorical', # type of label arrays that are returned
batch_size=nBatches, # size of the batches of data
shuffle=True, # whether to shuffle the data
seed=42) # random seed for shuffling and transformations
return generator
- 我根据警告设置了
fit_generator()
中的use_multiprocessing=False
:use_multiprocessing=True and multiple workers may duplicate your data
- I set
use_multiprocessing=False
infit_generator()
according to the warning:use_multiprocessing=True and multiple workers may duplicate your data
history = model.fit_generator(generator=trainGenerator,
steps_per_epoch=trainGenerator.samples//nBatches, # total number of steps (batches of samples)
epochs=nEpochs, # number of epochs to train the model
verbose=2, # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
callbacks=callback, # keras.callbacks.Callback instances to apply during training
validation_data=valGenerator, # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
validation_steps=
valGenerator.samples//nBatches, # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
class_weight=None, # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
max_queue_size=10, # maximum size for the generator queue
workers=1, # maximum number of processes to spin up when using process-based threading
use_multiprocessing=False, # whether to use process-based threading
shuffle=False, # whether to shuffle the order of the batches at the beginning of each epoch
initial_epoch=0) # epoch at which to start training
- 我按照 keras文档关于如何在开发过程中使用Keras获得可再现的结果
- I unified my python setup as suggested in the keras documentation on how to obtain reproducible results using Keras during development
import tensorflow as tf
import random as rn
from keras import backend as K
np.random.seed(42)
rn.seed(12345)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
- 我现在不再使用
datagen = ImageDataGenerator(rescale=1./255)
重新缩放输入图像,而是使用以下命令生成数据: - Instead of rescaling input images with
datagen = ImageDataGenerator(rescale=1./255)
, I now generate my data with:
from keras.applications.resnet50 import preprocess_input
datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
有了这个,我从fit_generator()
和evaluate_generator()
设法达到了相似的准确性和损失.而且,使用相同的数据进行培训和测试现在会产生相似的指标.
With this, I managed to have a similar accuracy and loss from fit_generator()
and evaluate_generator()
. Also, using the same data for training and testing now results in a similar metrics. Reasons for remaining differences are provided in the keras documentation.
这篇关于为什么fit_generator的精度与Keras中的valuate_generator的精度不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!