使用 Pandas 和 Sklearn.Neighbors [英] Using Pandas and Sklearn.Neighbors
问题描述
我正在尝试使用 Python 3.5/Pandas/Sklearn.neighbors 在数据帧上拟合 KNN 模型.我已导入数据,将其拆分为训练和测试数据和标签,但是当我尝试使用它进行预测时,出现以下错误.我对 Pandas 还很陌生,所以任何帮助将不胜感激,谢谢!
将pandas导入为pd从 sklearn 导入 cross_validation将 numpy 导入为 np从 sklearn.neighbors 导入 KNeighborsRegressorseed = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','物种'])数据=seeds.iloc[:,[0,1,2,3,4,5,6]]标签=seeds.iloc[:,[7]]x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1)knn = KNeighborsRegressor(n_neighbors=30)knn.fit(x_train,y_train)knn.predict(x_test)---------------------------------------------------------------------------TypeError Traceback(最近一次调用最后一次)<ipython-input-121-2292e64e5ab8>在 <module>()---->1 knn.predict(x_test)C:\Anaconda3\lib\site-packages\sklearn\neighbors\regression.py in predict(self, X)151152 如果权重为无:-->153 y_pred = np.mean(_y[neigh_ind],axis=1)154 其他:155 y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.float)C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in mean(a, axis, dtype, out, keepdims)28762877 返回_methods._mean(a,轴=轴,dtype=dtype,->第 2878 章28792880C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)66 如果 isinstance(ret, mu.ndarray):67 ret = um.true_divide(--->68 ret, rcount, out=ret, cast='unsafe', subok=False)69 elif hasattr(ret, 'dtype'):70 ret = ret.dtype.type(ret/rcount)类型错误:不支持/的操作数类型:'str' 和 'int'
您应该为此 KNN 使用 KNeighborsClassifier
.您正在尝试预测标签 Species
以进行分类.上面代码中的回归器试图训练和预测连续取值的数值变量,这就是您的问题所在.
from sklearn.neighbors import KNeighborsClassifierseed = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','物种'])数据=seeds.iloc[:,[0,1,2,3,4,5,6]]标签=seeds.iloc[:,[7]]x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1)knn = KNeighborsClassifier(n_neighbors=30)
I'm trying to fit a KNN model on a dataframe, using Python 3.5/Pandas/Sklearn.neighbors. I've imported the data, split it into training and testing data and labels, but when I try to predict using it, I get the following error. I'm quite new to Pandas so any help would be appreciated, thanks!
import pandas as pd
from sklearn import cross_validation
import numpy as np
from sklearn.neighbors import KNeighborsRegressor
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species'])
data = seeds.iloc[:,[0,1,2,3,4,5,6]]
labels = seeds.iloc[:,[7]]
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1 )
knn = KNeighborsRegressor(n_neighbors=30)
knn.fit(x_train,y_train)
knn.predict(x_test)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-121-2292e64e5ab8> in <module>()
----> 1 knn.predict(x_test)
C:\Anaconda3\lib\site-packages\sklearn\neighbors\regression.py in predict(self, X)
151
152 if weights is None:
--> 153 y_pred = np.mean(_y[neigh_ind], axis=1)
154 else:
155 y_pred = np.empty((X.shape[0], _y.shape[1]), dtype=np.float)
C:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in mean(a, axis, dtype, out, keepdims)
2876
2877 return _methods._mean(a, axis=axis, dtype=dtype,
-> 2878 out=out, keepdims=keepdims)
2879
2880
C:\Anaconda3\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)
66 if isinstance(ret, mu.ndarray):
67 ret = um.true_divide(
---> 68 ret, rcount, out=ret, casting='unsafe', subok=False)
69 elif hasattr(ret, 'dtype'):
70 ret = ret.dtype.type(ret / rcount)
TypeError: unsupported operand type(s) for /: 'str' and 'int'
You should be using the KNeighborsClassifier
for this KNN. You are trying to predict the label Species
for classification. The regressor in your code above is trying to train and predict continuously valued numerical variables, which is where your problem is being introduced.
from sklearn.neighbors import KNeighborsClassifier
seeds = pd.read_csv('seeds.tsv',sep='\t',names=['Area','Perimeter','Compactness','Kern_len','Kern_width','Assymetry','Kern_groovlen','Species'])
data = seeds.iloc[:,[0,1,2,3,4,5,6]]
labels = seeds.iloc[:,[7]]
x_train, x_test, y_train, y_test = cross_validation.train_test_split(data,labels, test_size=0.4, random_state=1 )
knn = KNeighborsClassifier(n_neighbors=30)
http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html
Here is what the regressor would plot compared to the classifier (which you want to use).
这篇关于使用 Pandas 和 Sklearn.Neighbors的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!