使用cross_val_predict sklearn计算评估指标 [英] Calculate evaluation metrics using cross_val_predict sklearn

查看:602
本文介绍了使用cross_val_predict sklearn计算评估指标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

sklearn.model_selection.cross_val_predict 页面指出:


为每个输入数据点生成交叉验证的估计。
不适合将这些预测值用作评估指标。

Generate cross-validated estimates for each input data point. It is not appropriate to pass these predictions into an evaluation metric.

有人可以解释一下这是什么意思吗?如果这样可以得出每个Y(真实Y)的Y(y预测)估算值,为什么不能使用这些结果来计算诸如RMSE或确定系数的度量?

Can someone explain what does it mean? If this gives estimate of Y (y prediction) for every Y (true Y), why can't I calculate metrics such as RMSE or coefficient of determination using these results?

推荐答案

它似乎基于样本的分组和预测方式。从用户指南链接到 cross_val_predict 文档:

It seems to be based on how samples are grouped and predicted. From the user guide linked in the cross_val_predict docs:


关于不正确使用cross_val_predict的警告

Warning Note on inappropriate usage of cross_val_predict


cross_val_predict的结果可能与使用
cross_val_score获得的结果不同,因为元素以不同的方式分组。
函数cross_val_score取交叉验证折痕的平均值,即
,而cross_val_predict只是简单地返回来自几个不同模型的标签(或概率)
。因此,cross_val_predict
不能适当地度量泛化误差。

The result of cross_val_predict may be different from those obtained using cross_val_score as the elements are grouped in different ways. The function cross_val_score takes an average over cross-validation folds, whereas cross_val_predict simply returns the labels (or probabilities) from several distinct models undistinguished. Thus, cross_val_predict is not an appropriate measure of generalisation error.

cross_val_score 似乎说它是所有折叠的平均值,而 cross_val_predict 将单个折叠和不同的模型分组,但不是全部,因此它也不一定会一概而论。例如,使用sklearn页面中的示例代码:

The cross_val_score seems to say that it averages across all of the folds, while the cross_val_predict groups individual folds and distinct models but not all and therefore it won't necessarily generalize as well. For example, using the sample code from the sklearn page:

from sklearn import datasets, linear_model
from sklearn.model_selection import cross_val_predict, cross_val_score
from sklearn.metrics import mean_squared_error, make_scorer
diabetes = datasets.load_diabetes()
X = diabetes.data[:200]
y = diabetes.target[:200]
lasso = linear_model.Lasso()
y_pred = cross_val_predict(lasso, X, y, cv=3)

print("Cross Val Prediction score:{}".format(mean_squared_error(y,y_pred)))

print("Cross Val Score:{}".format(np.mean(cross_val_score(lasso, X, y, cv=3, scoring = make_scorer(mean_squared_error)))))

Cross Val Prediction score:3993.771257795029
Cross Val Score:3997.1789145156217

这篇关于使用cross_val_predict sklearn计算评估指标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆