使用交叉Val得分获得零得分 [英] Getting a score of zero using cross val score

查看:95
本文介绍了使用交叉Val得分获得零得分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在数据集上使用cross_val_score,但我的得分一直为零:

I am trying to use cross_val_score on my dataset, but I keep getting zeros as the score:

这是我的代码:

df = pd.read_csv("Flaveria.csv")
df = pd.get_dummies(df, columns=["N level", "species"], drop_first=True)

# Extracting the target value from the dataset
X = df.iloc[:, df.columns != "Plant Weight(g)"]
y = np.array(df.iloc[:, 0], dtype="S6")

logreg = LogisticRegression()
loo = LeaveOneOut()

scores = cross_val_score(logreg, X, y, cv=loo)
print(scores)

特征是分类值,而目标值是浮点值.我不确定为什么我只能得到零.

The features are categorical values, while the target value is a float value. I am not exactly sure why I am ONLY getting zeros.

在创建虚拟变量之前,数据看起来像这样

The data looks like this before creating dummy variables

N level,species,Plant Weight(g)
L,brownii,0.3008
L,brownii,0.3288
M,brownii,0.3304
M,brownii,0.388
M,brownii,0.406
H,brownii,0.3955
H,brownii,0.3797
H,brownii,0.2962

更新后的代码我仍然为零:

Updated code where I am still getting zeros:

 from sklearn.model_selection import LeaveOneOut
from sklearn.model_selection import cross_val_score

from sklearn.ensemble import RandomForestRegressor


import numpy as np
import pandas as pd

# Creating dummies for the non numerical features in the dataset

df = pd.read_csv("Flaveria.csv")
df = pd.get_dummies(df, columns=["N level", "species"], drop_first=True)

# Extracting the target value from the dataset
X = df.iloc[:, df.columns != "Plant Weight(g)"]
y = df.iloc[:, 0]

forest = RandomForestRegressor()
loo = LeaveOneOut()

scores = cross_val_score(forest, X, y, cv=loo)
print(scores)

推荐答案

一般cross_val_score会将数据拆分为训练并使用给定的迭代器进行测试,然后将模型与训练数据拟合并在测试折痕上得分.对于回归,默认为 r2_score 在scikit中.

The general cross_val_score will split the data into train and test with the given iterator, then fit the model with the train data and score on the test fold. And for regressions, r2_score is the default in scikit.

您已将LeaveOneOut()指定为您的cv迭代器.因此,每个折叠将包含一个测试用例.在这种情况下,R_squared始终为0.

You have specified LeaveOneOut() as your cv iterator. So each fold will contain a single test case. In this case, R_squared will always be 0.

查看维基百科中R2的公式:

R2 = 1 - (SS_res/SS_tot)

还有

SS_tot = sqr(sum(y - y_mean))

在单个情况下,y_mean将等于y的值,因此分母为0.因此整个R2是未定义的(Nan).在这种情况下,scikit-learn会将值设置为0,而不是nan.

Here for a single case, y_mean will be equal to y value and hence denominator is 0. So the whole R2 is undefined (Nan). In this case, scikit-learn will set the value to 0, instead of nan.

将LeaveOneOut()更改为任何其他CV迭代器(如KFold),将使您得到一些非零的结果,就像您已经观察到的那样.

Changing the LeaveOneOut() to any other CV iterator like KFold, will give you some non-zero results as you have already observed.

这篇关于使用交叉Val得分获得零得分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆