培训和测试错误之间的多少差异被认为合适? [英] How much difference between training and test error is considered suitable?

查看:55
本文介绍了培训和测试错误之间的多少差异被认为合适?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究回归问题,我使用带有决策树的ad-boost进行回归,并使用r ^ 2作为评估指标.我想知道训练r ^ 2和测试r ^ 2之间有多少区别被认为是合适的.我训练的r ^ 2是 0.9438 ,测试的r ^ 2是 0.877 .我只是想知道 确切 可以接受还是适合??.

解决方案

您的问题有几个问题.

首先,当然不推荐使用r ^ 2作为预测性问题的性能评估指标;引用我在

(图片摘录自关于过拟合的Wikipedia条目-不同的东西可能存在于水平轴,例如这里的树的数量)

我只想确切地知道 可接受的合适的是多少?/p>

这个问题没有普遍的答案;一切都取决于数据的详细信息以及您要解决的业务问题.

I am working on regression problem and i used ad-boost with decision tree for regression and r^2 as evaluation measure. I want to know how much difference between training r^2 and testing r^2 is considered suitable. I have training r^2 is 0.9438 and testing r^2 is 0.877. Is it over-fitting or good?.I just want to know exactly how much difference between training and testing is acceptable or suitable?.

解决方案

There are several issues with your question.

To start with, r^2 is certainly not recommended as a performance evaluation measure for predictive problems; quoting from my own answer in another SO thread:

the whole R-squared concept comes in fact directly from the world of statistics, where the emphasis is on interpretative models, and it has little use in machine learning contexts, where the emphasis is clearly on predictive models; at least AFAIK, and beyond some very introductory courses, I have never (I mean never...) seen a predictive modeling problem where the R-squared is used for any kind of performance assessment; neither it's an accident that popular machine learning introductions, such as Andrew Ng's Machine Learning at Coursera, do not even bother to mention it. And, as noted in the Github thread above (emphasis added):

In particular when using a test set, it's a bit unclear to me what the R^2 means.

with which I certainly concur.

Second:

I have training r^2 is 0.9438 and testing r^2 is 0.877. Is it over-fitting or good?

A difference between a training and a test score by itself does not signify overfitting. This is just the generalization gap, i.e. the expected gap in the performance between the training and validation sets; quoting from a recent blog post by Google AI:

An important concept for understanding generalization is the generalization gap, i.e., the difference between a model’s performance on training data and its performance on unseen data drawn from the same distribution.

The telltale signature of overfitting is when your validation loss starts increasing, while your training loss continues decreasing, i.e.:

(image adapted from the Wikipedia entry on overfitting - diferent things may lie in the horizontal axis, e.g. here the number of boosted trees)

I just want to know exactly how much difference between training and testing is acceptable or suitable?

There is no general answer to this question; everything depends on the details of your data and the business problem you are trying to solve.

这篇关于培训和测试错误之间的多少差异被认为合适?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆