使用Scikit-Learn进行多元分类和回归模型的准确性 [英] Accuracy of multivariate classification and regression models with Scikit-Learn

查看:206
本文介绍了使用Scikit-Learn进行多元分类和回归模型的准确性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个简单的线性回归模型和一个决策树模型,它们运作良好. 我的问题是,如何计算这两个模型的准确性.我的意思是,计算分类模型和回归模型的准确性之间有什么区别?我需要将数据拆分为训练和测试吗?

到现在为止,我一直在使用.score(x_test, y_test),但是我读到那不是模型的准确性.我尝试使用指标,但总是收到此错误:

ValueError: Found input variables with inconsistent numbers of samples: [2, 1]

请检查我的代码,我试图使其正常工作,但失败了.

这是代码:

import pandas as pd
from sklearn import linear_model
from sklearn import tree
from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error


dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
       'par_2': [1, 3, 1, 2, 3, 3, 2],
       'outcome': [101, 905, 182, 268, 646, 624, 465]}

df = pd.DataFrame(dic)

variables = df.iloc[:,:-1]
results = df.iloc[:,-1]

var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.2, random_state = 4)

regression = linear_model.LinearRegression()
regression.fit(var_train, res_train)

input_values = [14, 2]

prediction = regression.predict([input_values])
print(prediction)

accuracy_regression = mean_squared_error(var_test, prediction)
print(accuracy_regression)


dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
       'par_2': [1, 3, 1, 2, 3, 3, 2],
       'outcome': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'yes']}

df = pd.DataFrame(dic)

variables = df.iloc[:,:-1]
results = df.iloc[:,-1]

var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.2, random_state = 4)

decision_tree = tree.DecisionTreeClassifier()
decision_tree.fit(var_train, res_train)

input_values = [18, 2]

prediction = decision_tree.predict([input_values])[0]
print(prediction)

accuracy_classification = accuracy_score(res_test, prediction)
print(accuracy_classification)

解决方案

精度是用于分类但不用于回归的度量.在回归的情况下,可以使用R平方,负均方误差等.精度定义为正确分类为数据点总数的数据点数,在连续变量的情况下不使用.

您可以使用以下度量标准来度量回归模型的可预测性. https://scikit-learn.org/stable/modules/classes. html#regression-metrics 例如,您可以使用

计算R平方

metrics.r2_score(y_true, y_pred[, …])

此外,可以为分类模型实现以下内容. https://scikit-learn.org/stable/modules/classes. html#classification-metrics 可以使用

来计算精度

metrics.accuracy_score(y_true, y_pred[, …])

在您的情况下,您可以使用以下方法为回归模型计算R平方:

y_pred_test = regression.predict(x_test)
metrics.score(y_true, y_pred_test)

以下内容还为您提供了决策树的准确性.

y_pred_test = decision_tree.predict(x_test)
metrics.accuracy_score(y_true, y_pred_test)

I wrote one simple linear regression model and one decision tree model, they work good. My question is, how to calculate the accuracy of these two models. I mean, whats the difference between calculating the accuracy of classification and regression models? Do I need to split data into train and test?

Till now , i was using .score(x_test, y_test) but I read that that is not accuracy of model. I have tried to use metrics but I always get this error:

ValueError: Found input variables with inconsistent numbers of samples: [2, 1]

Please check out my code , I have tried to make it work, but I failed.

This is the code:

import pandas as pd
from sklearn import linear_model
from sklearn import tree
from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error


dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
       'par_2': [1, 3, 1, 2, 3, 3, 2],
       'outcome': [101, 905, 182, 268, 646, 624, 465]}

df = pd.DataFrame(dic)

variables = df.iloc[:,:-1]
results = df.iloc[:,-1]

var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.2, random_state = 4)

regression = linear_model.LinearRegression()
regression.fit(var_train, res_train)

input_values = [14, 2]

prediction = regression.predict([input_values])
print(prediction)

accuracy_regression = mean_squared_error(var_test, prediction)
print(accuracy_regression)


dic = {'par_1': [10, 30, 13, 19, 25, 33, 23],
       'par_2': [1, 3, 1, 2, 3, 3, 2],
       'outcome': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'yes']}

df = pd.DataFrame(dic)

variables = df.iloc[:,:-1]
results = df.iloc[:,-1]

var_train, var_test, res_train, res_test = train_test_split(variables, results, test_size = 0.2, random_state = 4)

decision_tree = tree.DecisionTreeClassifier()
decision_tree.fit(var_train, res_train)

input_values = [18, 2]

prediction = decision_tree.predict([input_values])[0]
print(prediction)

accuracy_classification = accuracy_score(res_test, prediction)
print(accuracy_classification)

解决方案

Accuracy is a metric used for classification but not for regression. In the case of regression, you can use R squared, negative mean squared error, etc. Accuracy is defined as the number of data points classified correctly to the total number of data points and it not used in the case of continuous variables.

You can use the following metric for measuring the predictability of a regression model. https://scikit-learn.org/stable/modules/classes.html#regression-metrics For example, you can compute R squared using

metrics.r2_score(y_true, y_pred[, …])

Also, the following ones can be implemented for a classification model. https://scikit-learn.org/stable/modules/classes.html#classification-metrics Accuracy can be computed using

metrics.accuracy_score(y_true, y_pred[, …])

In your case, you can compute R squared for the regression model using:

y_pred_test = regression.predict(x_test)
metrics.score(y_true, y_pred_test)

And also the following gives you the accuracy of your decision tree.

y_pred_test = decision_tree.predict(x_test)
metrics.accuracy_score(y_true, y_pred_test)

这篇关于使用Scikit-Learn进行多元分类和回归模型的准确性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆