使用 Pandas 和 Sklearn.Neighbors [英] Using Pandas and Sklearn.Neighbors

查看:53
本文介绍了使用 Pandas 和 Sklearn.Neighbors的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Python 3.5/Pandas/Sklearn.neighbors 在数据帧上拟合 KNN 模型.我已导入数据,将其拆分为训练和测试数据和标签,但是当我尝试使用它进行预测时,出现以下错误.我对 Pandas 还很陌生,所以任何帮助将不胜感激,谢谢!

将pandas导入为pd从 sklearn 导入 cross_validation将 numpy 导入为 np从 sklearn.neighbors 导入 KNeighborsRegressorseed = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','物种'])数据=seeds.iloc[:,[0,1,2,3,4,5,6]]标签=seeds.iloc[:,[7]]x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1)knn = KNeighborsRegressor(n_neighbors=30)knn.fit(x_train,y_train)knn.predict(x_test)---------------------------------------------------------------------------TypeError Traceback(最近一次调用最后一次)<ipython-input-121-2292e64e5ab8>在 <module>()---->1 knn.predict(x_test)C:\Anaconda3\lib\site-packages\sklearn\neighbors\regression.py in predict(self, X)151152 如果权重为无:-->153 y_pred = np.mean(_y[neigh_ind],axis=1)154 其他:155 y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.float)C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in mean(a, axis, dtype, out, keepdims)28762877 返回_methods._mean(a,轴=轴,dtype=dtype,->第 2878 章28792880C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)66 如果 isinstance(ret, mu.ndarray):67 ret = um.true_divide(--->68 ret, rcount, out=ret, cast='unsafe', subok=False)69 elif hasattr(ret, 'dtype'):70 ret = ret.dtype.type(ret/rcount)类型错误:不支持/的操作数类型:'str' 和 'int'

解决方案

您应该为此 KNN 使用 KNeighborsClassifier.您正在尝试预测标签 Species 以进行分类.上面代码中的回归器试图训练和预测连续取值的数值变量,这就是您的问题所在.

from sklearn.neighbors import KNeighborsClassifierseed = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','物种'])数据=seeds.iloc[:,[0,1,2,3,4,5,6]]标签=seeds.iloc[:,[7]]x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1)knn = KNeighborsClassifier(n_neighbors=30)

I'm trying to fit a KNN model on a dataframe, using Python 3.5/Pandas/Sklearn.neighbors. I've imported the data, split it into training and testing data and labels, but when I try to predict using it, I get the following error. I'm quite new to Pandas so any help would be appreciated, thanks!

import pandas as pd
from sklearn import cross_validation
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species'])
data = seeds.iloc[:,[0,1,2,3,4,5,6]]
labels = seeds.iloc[:,[7]]
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1 )
knn = KNeighborsRegressor(n_neighbors=30)
knn.fit(x_train,y_train)
knn.predict(x_test)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-121-2292e64e5ab8> in <module>()
----> 1 knn.predict(x_test)

C:\Anaconda3\lib\site-packages\sklearn\neighbors\regression.py in predict(self, X)
    151 
    152         if weights is None:
--> 153             y_pred = np.mean(_y[neigh_ind], axis=1)
    154         else:
    155             y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.float)

C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in mean(a, axis, dtype, out, keepdims)
   2876 
   2877     return _methods._mean(a, axis=axis, dtype=dtype,
-> 2878                           out=out, keepdims=keepdims)
   2879 
   2880 

C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)
     66     if isinstance(ret, mu.ndarray):
     67         ret = um.true_divide(
---> 68                 ret, rcount, out=ret, casting='unsafe', subok=False)
     69     elif hasattr(ret, 'dtype'):
     70         ret = ret.dtype.type(ret / rcount)

TypeError: unsupported operand type(s) for /: 'str' and 'int'

解决方案

You should be using the KNeighborsClassifier for this KNN. You are trying to predict the label Species for classification. The regressor in your code above is trying to train and predict continuously valued numerical variables, which is where your problem is being introduced.

from sklearn.neighbors import KNeighborsClassifier
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species'])
data = seeds.iloc[:,[0,1,2,3,4,5,6]]
labels = seeds.iloc[:,[7]]
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1 )
knn = KNeighborsClassifier(n_neighbors=30)

http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html

Here is what the regressor would plot compared to the classifier (which you want to use).

这篇关于使用 Pandas 和 Sklearn.Neighbors的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆