准确度分数 ValueError:无法处理二进制和连续目标的混合 [英] Accuracy Score ValueError: Can't Handle mix of binary and continuous target

查看:30
本文介绍了准确度分数 ValueError:无法处理二进制和连续目标的混合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用来自 scikit-learn 的 linear_model.LinearRegression 作为预测模型.它有效而且很完美.我在使用 accuracy_score 指标评估预测结果时遇到问题.

I'm using linear_model.LinearRegression from scikit-learn as a predictive model. It works and it's perfect. I have a problem to evaluate the predicted results using the accuracy_score metric.

这是我的真实数据:

array([1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0])

我的预测数据:

array([ 0.07094605,  0.1994941 ,  0.19270157,  0.13379635,  0.04654469,
    0.09212494,  0.19952108,  0.12884365,  0.15685076, -0.01274453,
    0.32167554,  0.32167554, -0.10023553,  0.09819648, -0.06755516,
    0.25390082,  0.17248324])

我的代码:

accuracy_score(y_true, y_pred, normalize=False)

错误信息:

ValueError: Can't handle mix of binary and continuous target

推荐答案

尽管这里有大量错误答案试图通过数值操纵预测来规避错误,但错误的根本原因是理论em> 而不是计算问题:您正在尝试在回归(即数值预测)模型 (LinearRegression) 中使用 分类 指标(准确性),即 毫无意义.

Despite the plethora of wrong answers here that attempt to circumvent the error by numerically manipulating the predictions, the root cause of your error is a theoretical and not computational issue: you are trying to use a classification metric (accuracy) in a regression (i.e. numeric prediction) model (LinearRegression), which is meaningless.

就像大多数性能指标一样,准确性将苹果与苹果进行比较(即真实标签为 0/1,预测再次为 0/1);因此,当您要求该函数将二进制真实标签(苹果)与连续预测(橙色)进行比较时,您会得到一个预期错误,其中该消息从计算点准确地告诉您问题是什么视图:

Just like the majority of performance metrics, accuracy compares apples to apples (i.e true labels of 0/1 with predictions again of 0/1); so, when you ask the function to compare binary true labels (apples) with continuous predictions (oranges), you get an expected error, where the message tells you exactly what the problem is from a computational point of view:

Classification metrics can't handle a mix of binary and continuous target

尽管该消息没有直接告诉您您正在尝试计算对您的问题无效的指标(并且我们实际上不应该期望它走那么远),但 scikit 肯定是一件好事-learn 至少给你一个直接而明确的警告,告诉你你正在尝试错误的东西;其他框架不一定是这种情况 - 例如参见 Keras 在非常相似的情况下的行为,在这种情况下,您根本没有收到任何警告,最后只会抱怨回归设置中的准确度"低...

Despite that the message doesn't tell you directly that you are trying to compute a metric that is invalid for your problem (and we shouldn't actually expect it to go that far), it is certainly a good thing that scikit-learn at least gives you a direct and explicit warning that you are attempting something wrong; this is not necessarily the case with other frameworks - see for example the behavior of Keras in a very similar situation, where you get no warning at all, and one just ends up complaining for low "accuracy" in a regression setting...

我对这里的所有其他答案(包括被接受和高度赞成的答案)感到非常惊讶,有效地建议操纵预测以简单地消除错误;确实,一旦我们得到一组数字,我们当然可以开始以各种方式(四舍五入、阈值等)与它们混合以使我们的代码表现良好,但这当然并不意味着我们的数字操作是在我们试图解决的 ML 问题的特定上下文中有意义.

I am super-surprised with all the other answers here (including the accepted & highly upvoted one) effectively suggesting to manipulate the predictions in order to simply get rid of the error; it's true that, once we end up with a set of numbers, we can certainly start mingling with them in various ways (rounding, thresholding etc) in order to make our code behave, but this of course does not mean that our numeric manipulations are meaningful in the specific context of the ML problem we are trying to solve.

所以,总结一下:问题在于您正在应用一个不合适的指标(准确度)对于您的模型(LinearRegression):如果您处于分类设置,你应该改变你的模型(例如使用 LogisticRegression 代替);如果您处于回归(即数字预测)设置中,则应更改指标.检查 scikit-learn 中可用的指标列表,您可以在其中确认准确率仅用于分类.

So, to wrap up: the problem is that you are applying a metric (accuracy) that is inappropriate for your model (LinearRegression): if you are in a classification setting, you should change your model (e.g. use LogisticRegression instead); if you are in a regression (i.e. numeric prediction) setting, you should change the metric. Check the list of metrics available in scikit-learn, where you can confirm that accuracy is used only in classification.

也将情况与 最近的 SO 问题,其中 OP 试图获得模型列表的准确性:

Compare also the situation with a recent SO question, where the OP is trying to get the accuracy of a list of models:

models = []
models.append(('SVM', svm.SVC()))
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
#models.append(('SGDRegressor', linear_model.SGDRegressor())) #ValueError: Classification metrics can't handle a mix of binary and continuous targets
#models.append(('BayesianRidge', linear_model.BayesianRidge())) #ValueError: Classification metrics can't handle a mix of binary and continuous targets
#models.append(('LassoLars', linear_model.LassoLars())) #ValueError: Classification metrics can't handle a mix of binary and continuous targets
#models.append(('ARDRegression', linear_model.ARDRegression())) #ValueError: Classification metrics can't handle a mix of binary and continuous targets
#models.append(('PassiveAggressiveRegressor', linear_model.PassiveAggressiveRegressor())) #ValueError: Classification metrics can't handle a mix of binary and continuous targets
#models.append(('TheilSenRegressor', linear_model.TheilSenRegressor())) #ValueError: Classification metrics can't handle a mix of binary and continuous targets
#models.append(('LinearRegression', linear_model.LinearRegression())) #ValueError: Classification metrics can't handle a mix of binary and continuous targets

前 6 个模型工作正常,而其余所有(注释掉)模型都出现相同的错误.到现在为止,您应该能够说服自己所有注释掉的模型都是回归(而不是分类)模型,因此是合理的错误.

where the first 6 models work OK, while all the rest (commented-out) ones give the same error. By now, you should be able to convince yourself that all the commented-out models are regression (and not classification) ones, hence the justified error.

最后一个重要说明:有人声称可能听起来是合法的:

A last important note: it may sound legitimate for someone to claim:

好的,但我想使用线性回归,然后只是舍入/阈值输出,有效地将预测视为概率",从而将模型转换为分类器

OK, but I want to use linear regression and then just round/threshold the outputs, effectively treating the predictions as "probabilities" and thus converting the model into a classifier

实际上,这里已经在其他几个答案中暗示或不暗示了这一点;再一次,这是一种无效方法(并且您有负面预测的事实应该已经提醒您它们不能被解释为概率).Andrew Ng 在 Coursera 上受欢迎的机器学习课程中解释了为什么这是一个坏主意 - 参见他的 讲座6.1 - 逻辑回归|Youtube 上的分类(解释从大约 3:00 开始),以及(强烈推荐和免费提供的)教科书 4.2 为什么不是线性回归 [用于分类]? 部分href="http://www-bcf.usc.edu/~gareth/ISL/" rel="noreferrer">统计学习简介 作者:Hastie、Tibshirani 和同事...

Actually, this has already been suggested in several other answers here, implicitly or not; again, this is an invalid approach (and the fact that you have negative predictions should have already alerted you that they cannot be interpreted as probabilities). Andrew Ng, in his popular Machine Learning course at Coursera, explains why this is a bad idea - see his Lecture 6.1 - Logistic Regression | Classification at Youtube (explanation starts at ~ 3:00), as well as section 4.2 Why Not Linear Regression [for classification]? of the (highly recommended and freely available) textbook An Introduction to Statistical Learning by Hastie, Tibshirani and coworkers...

这篇关于准确度分数 ValueError:无法处理二进制和连续目标的混合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆