当Keras模型中的训练和验证样本较小时,如何测量过度拟合 [英] How to measure overfitting when train and validation sample is small in Keras model

查看:176
本文介绍了当Keras模型中的训练和验证样本较小时,如何测量过度拟合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下情节:

使用以下数量的样本创建模型:

The model is created with the following number of samples:

                class1     class2
train             20         20
validate          21         13

据我了解,该图显示没有过度拟合.但我觉得, 由于样本非常小,因此我不确定该模型是否通用 足够的.

In my understanding, the plot show there is no overfitting. But I think, since the sample is very small, I'm not confident if the model is general enough.

除了上面的图以外,还有其他方法可以测量过度拟合吗?

Is there any other way to measure overfittingness other than the above plot?

这是我完整的代码:

library(keras)
library(tidyverse)


train_dir <- "data/train/"
validation_dir <- "data/validate/"



# Making model ------------------------------------------------------------


conv_base <- application_vgg16(
  weights = "imagenet",
  include_top = FALSE,
  input_shape = c(150, 150, 3)
)

# VGG16 based model -------------------------------------------------------

# Works better with regularizer
model <- keras_model_sequential() %>%
  conv_base() %>%
  layer_flatten() %>%
  layer_dense(units = 256, activation = "relu", kernel_regularizer = regularizer_l1(l = 0.01)) %>%
  layer_dense(units = 1, activation = "sigmoid")

summary(model)

length(model$trainable_weights)
freeze_weights(conv_base)
length(model$trainable_weights)


# Train model -------------------------------------------------------------
desired_batch_size <- 20 

train_datagen <- image_data_generator(
  rescale = 1 / 255,
  rotation_range = 40,
  width_shift_range = 0.2,
  height_shift_range = 0.2,
  shear_range = 0.2,
  zoom_range = 0.2,
  horizontal_flip = TRUE,
  fill_mode = "nearest"
)

# Note that the validation data shouldn't be augmented!
test_datagen <- image_data_generator(rescale = 1 / 255)


train_generator <- flow_images_from_directory(
  train_dir, # Target directory
  train_datagen, # Data generator
  target_size = c(150, 150), # Resizes all images to 150 × 150
  shuffle = TRUE,
  seed = 1,
  batch_size = desired_batch_size, # was 20
  class_mode = "binary" # binary_crossentropy loss for binary labels
)

validation_generator <- flow_images_from_directory(
  validation_dir,
  test_datagen,
  target_size = c(150, 150),
  shuffle = TRUE,
  seed = 1,
  batch_size = desired_batch_size,
  class_mode = "binary"
)

# Fine tuning -------------------------------------------------------------

unfreeze_weights(conv_base, from = "block3_conv1")

# Compile model -----------------------------------------------------------



model %>% compile(
  loss = "binary_crossentropy",
  optimizer = optimizer_rmsprop(lr = 2e-5),
  metrics = c("accuracy")
)


# Evaluate  by epochs  ---------------------------------------------------------------


#  # This create plots accuracy of various epochs (slow)
history <- model %>% fit_generator(
  train_generator,
  steps_per_epoch = 100,
  epochs = 15, # was 50
  validation_data = validation_generator,
  validation_steps = 50
)

plot(history)

推荐答案

这里有两件事:

  1. 对数据进行分层班级-您的验证数据与培训集的班级分布完全不同(培训集是平衡的,而验证集是-否).这可能会影响您的损失和指标值.最好对结果进行分层,以使两组的类比率相同.

  1. Stratify your data w.r.t. classes - your validation data has a completely different class distribution than your training set (train set is balanced whereas validation set - not). This might affect your losses and metrics values. It's better to stratify your results so the class ratio would be the same for both sets.

由于数据点太少,因此使用了更为粗糙的验证模式-您可能会看到总共只有74张图像.在这种情况下,将所有图像加载到numpy.array(使用flow函数仍然可以进行数据增强)并使用验证模式(当将数据放在文件夹中时很难获得)不是问题.我建议您使用的模式(来自sklearn)是:

With a so few data points use more rough validation schemas - as you may see you have only 74 images in total. In this case - it's not a problem to load all images to numpy.array (you still could have data augmentation using flow function) and use validation schemas which are hard to obtain when you have your data in a folder. The schemas (from sklearn) which I advice you to use are:

  • 分层的k倍交叉验证 -在其中将数据分为k个大块-对于每个 k-1 个大块的选择-首先在 k-1 上训练模型,然后然后在剩下的一个用于验证的指标上计算指标.最终结果是在验证块上获得的结果中的平均值.当然,您不仅可以检查平均值,还可以检查损失分布的其他统计信息(例如 min max 中位数等).您还可以将它们与每次折叠训练集获得的结果进行比较.
  • 留一手 -这是以前架构的一种特殊情况-其中/折叠的数量等于数据集中示例的数量.该方法被认为是衡量模型性能的最粗略方法.在深度学习中很少使用它,因为这样的事实是,训练过程通常会变慢,而数据集又会很大,以便在合理的时间内完成计算.
  • stratified k-fold cross-validation - where you divide your data into k chunks - and for each selection of k - 1 chunks - you first train your model on k - 1 and then compute metrics on the one which was left for validation. The final result is a mean out of results obtained on validation chunks. You could, of course, check not only mean but also other statistics of losses distribution (like e.g. min, max, median, etc.). You could also compare them with results obtained on a training set for each fold.
  • leave-one-out - this is a special case of previous schema - where the number of chunks / folds is equal to the number of examples in your dataset. This method is considered as the roughest way of measuring your model performance. It's rarely used in deep learning because of the fact that training process is usually to slow and datasets are to big in order to accomplish computations in a reasonable time.

这篇关于当Keras模型中的训练和验证样本较小时,如何测量过度拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆