ValueError:预期的n_neighbors< = 1.得到5 -Scikit K最近的分类器 [英] ValueError: Expected n_neighbors <= 1. Got 5 -Scikit K Nearest Classifier

查看:179
本文介绍了ValueError:预期的n_neighbors< = 1.得到5 -Scikit K最近的分类器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用SCIkit KNN和levenstein距离对字符串进行一些处理,就像此页底部的示例一样: http://scikit-learn.org/stable/faq.html .区别在于我的数据被分为训练集并位于一个数据帧中.

I'm using SCIkit KNN and levenstein distance to some work on strings, much like this example at the bottom of this page: http://scikit-learn.org/stable/faq.html . The difference being my data is split into training sets and is in a dataframe.

此处列出了拆分:

train_feature, test_feature, train_class, test_class = train_test_split(features, classes,
                                                    test_size=TEST_SET_SIZE, train_size=TRAINING_SET_SIZE,
                                                    random_state=42)

我有以下内容:

>>> model = KNeighborsClassifier(metric='pyfunc',func=machine_learning.custom_distance)
>>> model.fit(train_feature['id'], train_class.as_matrix(['gender']))
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='pyfunc',
       metric_params={'func': <function custom_distance at 0x7fd0236267b8>},
       n_neighbors=5, p=2, weights='uniform')

train_features有一个列([24000行x 1列]),id和train_class(名称:性别,dtype:对象)是带有性别"的系列,即"M"或"F".该ID对应于其他位置的字典中的键.

Where train_features has one column ([24000 rows x 1 columns]), id and train_class (Name: gender, dtype: object) is a series with "gender" which is 'M' or 'F'. The id corresponds to a key in a dict elsewhere.

自定义距离功能是:

def custom_distance(x,y):
i, j = int(x[0]), int(y[0])
return damerau_levenshtein_distance(lookup_dict[i],lookup_dict[j])

当我尝试获得模型的准确性时:

When I try to get the accuracy of the model:

 accuracy = model.score(test_feature, test_class)

我收到此错误:

 ValueError: Expected n_neighbors <= 1. Got 5

老实说,我真的很困惑.我检查了每个数据集的长度,它们很好.为什么会告诉我只有一个数据点可以绘制?任何帮助将不胜感激.

I'm honestly really confused. I've checked the length of each of my datasets and they are fine. Why would it be telling me I only have one data point to plot from? Any help would be greatly appreciated.

推荐答案

分类器认为您的数据集只有一个条目.可能会将id的向量解释为行向量,而不是列向量.

The classifier thinks that your dataset has only a single entry. Probably it interprets the vector of id's as a row vector instead of a column vector.

尝试

model.fit(train_feature.as_matrix(['id']), train_class.as_matrix(['gender']))

看看是否有帮助.

这篇关于ValueError:预期的n_neighbors&lt; = 1.得到5 -Scikit K最近的分类器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆