验证和测试精度差异很大 [英] Validation and Testing accuracy widely different

查看:69
本文介绍了验证和测试精度差异很大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在 kaggle 中处理数据集.在训练了训练数据的模型后,我在验证数据上对其进行了测试,得到了大约 0.49 的准确率.

I am currently working on a dataset in kaggle. After training the model of the training data, I testing it on the validation data and got an accuracy of around 0.49.

但是,相同的模型在测试数据上的准确率为 0.05.

However, the same model gives an accuracy of 0.05 on the testing data.

我使用神经网络作为我的模型

I am using neural networks as my model

那么,发生这种情况的可能原因是什么?如何开始检查和纠正这些问题?

So, what are the possible reasons for this to happen and how does one begin to check and correct these issues?

推荐答案

泛化差距大的原因:

  1. 不同的分布:验证集和测试集可能来自不同的分布.尝试验证它们确实是从您的代码中的同一进程中采样的.
  2. 样本数量:验证和/或测试集的规模太小.这意味着经验数据分布差异太大,解释了不同的报告准确性.一个例子是由数千张图像和数千个类别组成的数据集.然后,测试集可能包含一些不在验证集中的类(反之亦然).使用交叉验证来检查测试准确度是否总是低于验证准确度,或者它们是否通常在每个折叠中差异很大.
  3. 超参数过拟合:这也和两个集合的大小有关.你做过超参数调优吗?如果是这样,您可以在调整超参数之前检查精度差距是否存在,因为您可能过度拟合"了验证集上的超参数.
  4. 损失函数与准确度:您报告的准确度不同.您是否还检查了训练、验证和测试损失?您在损失函数上训练模型,因此这是最直接的性能衡量标准.如果准确率仅与损失函数松散耦合,并且测试损失大约与验证损失一样低,则可能解释了准确率差距.
  5. 代码中的错误:如果测试集和验证集是从同一过程中采样的并且足够大,则它们可以互换.这意味着测试和验证损失必须大致相等.因此,如果您检查了以上四点,我的下一个最佳猜测将是代码中的错误.例如,您不小心也在验证集上训练了模型.您可能希望在更大的数据集上训练模型,然后检查准确度是否仍然存在差异.
  1. Different distributions: The validation and test set might come from different distributions. Try to verify that they are indeed sampled from the same process in your code.
  2. Number of samples: The size of the validation and / or the test set is too low. This means that the empirical data distributions differ too much, explaining the different reported accuracies. One example would be a dataset consisting of thousands of images, but also thousands of classes. Then, the test set might contain some classes that are not in the validation set (and vice versa). Use cross-validation to check, if the test accuracy is always lower than the validation accuracy, or if they just generally differ a lot in each fold.
  3. Hyperparameter Overfitting: This is also related to the size of the two sets. Did you do hyperparameter tuning? If so, you can check if the accuracy gap existed before you tuned the hyperparameters, as you might have "overfitted" the hyperparameters on the validation set.
  4. Loss function vs. accuracy: you reported different accuracies. Did you also check the train, validation and test losses? You train your model on the loss function, so this is the most direct performance measure. If the accuracy is only loosely coupled to your loss function and the test loss is approximately as low as the validation loss, it might explain the accuracy gap.
  5. Bug in the code: if the test and validation set are sampled from the same process and are sufficiently large, they are interchangeable. This means that the test and validation losses must be approximately equal. So, if you checked the four points above, my next best guess would be a bug in the code. For example, you accidentally trained your model on the validation set as well. You might want to train your model on a larger dataset and then check, if the accuracies still diverge.

这篇关于验证和测试精度差异很大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆