我的准确度是 0.0,我不知道为什么? [英] My accuracy is at 0.0 and I don't know why?

查看:49
本文介绍了我的准确度是 0.0,我不知道为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了 0.0 的准确度.我使用的是波士顿住房数据集.

I am getting an accuracy of 0.0. I am using the boston housing dataset.

这是我的代码:

import sklearn
from sklearn import datasets
from sklearn import svm, metrics
from sklearn import linear_model, preprocessing
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
boston = datasets.load_boston()

x = boston.data
y = boston.target

train_data, test_data, train_label, test_label = sklearn.model_selection.train_test_split(x, y, test_size=0.2)

model = KNeighborsClassifier()

lab_enc = preprocessing.LabelEncoder()
train_label_encoded = lab_enc.fit_transform(train_label)
test_label_encoded = lab_enc.fit_transform(test_label)

model.fit(train_data, train_label_encoded)
predicted = model.predict(test_data)
accuracy = model.score(test_data, test_label_encoded)
print(accuracy)

如何提高此数据集的准确性?

How can I increase the accuracy on this dataset?

推荐答案

Boston 数据集用于回归问题.文档:

Boston dataset is for regression problems. Definition in the docs:

加载并返回波士顿房价数据集(回归).

Load and return the boston house-prices dataset (regression).

因此,如果您使用普通编码(例如标签不是来自连续数据的样本)是没有意义的.例如,您将 12.3 和 12.4 编码为完全不同的标签,但它们彼此非常接近,如果分类器在实际目标为 12.3 时预测 12.4,则您评估的结果是错误的,但这不是二元情况.在分类中,预测是否正确,但在回归中,它以不同的方式计算,例如均方误差.

So, it does not make sense if you use an ordinary encoding like the labels are not samples from a continuous data. For example, you encode 12.3 and 12.4 to completely different labels but they are pretty close to each other, and you evaluate the result wrong if the classifier predicts 12.4 when the real target is 12.3, but this is not a binary situation. In classification, the prediction is whether correct or not, but in regression it is calculated in a different way such as mean square error.

这部分不是必须的,但是我想给你举一个相同数据集和源代码的例子.将标签四舍五入到零(到最接近零的整数)的简单想法会给你一些直觉.

This part is not necessary, but I would like to give you an example for the same dataset and source code. With a simple idea of rounding the labels towards zero(to the nearest integer to zero) will give you some intuition.

5.0-5.9 -> 5
6.0-6.9 -> 6
...
50.0-50.9 -> 50

让我们稍微更改一下您的代码.

Let's change your code a little bit.

import numpy as np

def encode_func(labels):
    return np.array([int(l) for l in labels])

...

train_label_encoded = encode_func(train_label)
test_label_encoded = encode_func(test_label)

输出将在 10% 左右.

The output will be around 10%.

这篇关于我的准确度是 0.0,我不知道为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆