术语准确性和验证准确性之间有什么区别 [英] What is the difference between the terms accuracy and validation accuracy

查看:109
本文介绍了术语准确性和验证准确性之间有什么区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用Keras的LSTM构建了一个模型,该模型可以检测Stack Overflow上的两个问题是否重复.运行模型时,我会在时代中看到类似的东西.

I have used LSTM from Keras to build a model that can detect if two questions on Stack overflow are duplicate or not. When I run the model I see something like this in the epochs.

Epoch 23/200
727722/727722 [==============================] - 67s - loss: 0.3167 - acc: 0.8557 - val_loss: 0.3473 - val_acc: 0.8418
Epoch 24/200
727722/727722 [==============================] - 67s - loss: 0.3152 - acc: 0.8573 - val_loss: 0.3497 - val_acc: 0.8404
Epoch 25/200
727722/727722 [==============================] - 67s - loss: 0.3136 - acc: 0.8581 - val_loss: 0.3518 - val_acc: 0.8391

我试图理解每个术语的含义.以上哪个值是我的模型的准确性.我是机器学习的新手,所以任何解释都将对您有所帮助.

I am trying to understand the meaning of each of these terms. Which of the above values is the accuracy of my model. I am comparatively new to machine learning, so any explanation would help.

推荐答案

训练机器学习模型时,要避免的主要事情之一就是过拟合.这是您的模型很好地适合训练数据的时候,但是它无法对以前从未见过的数据进行概括和准确预测.

When training a machine learning model, one of the main things that you want to avoid would be overfitting. This is when your model fits the training data well, but it isn't able to generalize and make accurate predictions for data it hasn't seen before.

要找出模型是否过拟合,数据科学家使用一种称为交叉验证的技术,将数据分为两部分-训练集和验证集.训练集用于训练模型,而验证集仅用于评估模型的性能.

To find out if their model is overfitting, data scientists use a technique called cross-validation, where they split their data into two parts - the training set, and the validation set. The training set is used to train the model, while the validation set is only used to evaluate the model's performance.

训练集上的指标可让您查看模型在训练方面的进展情况,而验证集上的指标可让您衡量模型的质量-能否做出新的预测基于以前从未见过的数据.

Metrics on the training set let you see how your model is progressing in terms of it's training, but it's metrics on the validation set that let you get a measure of the quality of your model - how well it's able to make new predictions based on data it hasn't seen before.

牢记这一点,损失和acc是对训练集的损失和准确性的度量,而val_loss和val_acc是对验证集的损失和准确性的度量.

With this in mind, loss and acc are measures of loss and accuracy on the training set, while val_loss and val_acc are measures of loss and accuracy on the validation set.

目前,您的模型在训练集上的准确度约为86%,在验证集上的准确度约为84%.这意味着您可以期望模型在新数据上的准确度达到〜84%.

At the moment your model has an accuracy of ~86% on the training set and ~84% on the validation set. This means that you can expect your model to perform with ~84% accuracy on new data.

我注意到,随着您的时代从23到25,您的acc指标会增加,而val_acc指标会减少.这意味着您的模型更适合训练集,但失去了对新数据进行预测的能力,这表明您的模型已开始适应噪声并开始过拟合.

I notice that as your epochs goes from 23 to 25, your acc metric increases, while your val_acc metric decreases. This means that your model is fitting the training set better, but is losing it's ability to predict on new data, indicating that your model is starting to fit on noise and is beginning to overfit.

因此,这是有关验证指标及其解释方式的简要说明.

So that is a quick explanation on validation metrics and how to interpret them.

这篇关于术语准确性和验证准确性之间有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆