k个最近邻,对准确性得分和混淆矩阵进行交叉验证 [英] k nearest neighbors with cross validation for accuracy score and confusion matrix
问题描述
我有以下数据,其中对于每一列,带有数字的行是输入,而字母是输出.
I have the following data where for each column, the rows with numbers are the input and the letter is the output.
A,A,A,B,B,B
-0.979090189,0.338819904,-0.253746508,0.213454999,-0.580601104,-0.441683968
-0.48395313,0.436456904,-1.427424032,-0.107093825,0.320813402,0.060866105
-1.098818173,-0.999161692,-1.371721698,-1.057324962,-1.161752652,-0.854872591
-1.53191442,-1.465454248,-1.350414216,-1.732518018,-1.674040715,-1.561568496
2.522796162,2.498153298,3.11756171,2.125738509,3.003929536,2.514411247
-0.060161596,-0.487513844,-1.083513761,-0.908023322,-1.047536921,-0.48276759
0.241962669,0.181365373,0.174042637,-0.048013217,-0.177434916,0.42738621
-0.603856395,-1.020531402,-1.091134021,-0.863008165,-0.683233589,-0.849059931
-0.626159165,-0.348144322,-0.518640038,-0.394482485,-0.249935646,-0.543947259
-1.407263942,-1.387660115,-1.612988118,-1.141282747,-0.944745366,-1.030944216
-0.682567673,-0.043613473,-0.105679403,0.135431139,0.059104888,-0.132060832
-1.10107164,-1.030047313,-1.239075022,-0.651818656,-1.043589073,-0.765992541
我正在尝试执行KNN LOOCV以获取准确性得分和混淆矩阵.
I am trying to perform KNN LOOCV to get accuracy score and confusion matrix.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import LeaveOneOut
import pandas as pd
def main():
csv = 'data.csv'
df = pd.read_csv(csv)
X = df.values.T
y = df.columns.values
clf = KNeighborsClassifier()
loo = LeaveOneOut()
for train_index, test_index in loo.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf.fit(X_train, y_train)
y_true = y_test
y_pred = clf.predict(X_test)
ac = accuracy_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)
print ac
print cm
if __name__ == '__main__':
main()
但是我的结果全为0.我要去哪里错了?
However my results are all 0s. Where am I going wrong?
推荐答案
我认为您的模型没有得到正确的训练,因为它只能猜测一个值而不能正确地进行训练.我可以建议切换到KFold或StratifiedKFold. LOO的缺点是,对于大样本而言,它非常费时.这是当我在您的X数据上实现3个拆分的StratifiedKFold时发生的情况.我已经用0和1随机填充y,而不是使用A和B,并且还没有转置数据,所以它有12行:
I think your model does not get trained properly and because it only has to guess one value it doesn't get it right. May I suggest switching to KFold or StratifiedKFold. LOO has the disadvantage that for large samples it becomes extemely time consuming. Here is what happened when I implemented StratifiedKFold with 3 splits on your X data. I have randomly filled y with 0 and 1, instead of using A and B and have not trasposed the data so it has 12 rows:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import StratifiedKFold
import pandas as pd
csv = 'C:\df_low_X.csv'
df = pd.read_csv(csv, header=None)
print(df)
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
clf = KNeighborsClassifier()
kf = StratifiedKFold(n_splits = 3)
ac = []
cm = []
for train_index, test_index in kf.split(X,y):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
print(X_train, X_test)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
ac.append(accuracy_score(y_test, y_pred))
cm.append(confusion_matrix(y_test, y_pred))
print(ac)
print(cm)
# ac
[0.25, 0.75, 0.5]
# cm
[array([[1, 1],
[2, 0]], dtype=int64),
array([[1, 1],
[0, 2]], dtype=int64),
array([[0, 2],
[0, 2]], dtype=int64)]
这篇关于k个最近邻,对准确性得分和混淆矩阵进行交叉验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!