StratifiedKFold的混淆矩阵和分类报告 [英] confusion matrix and classification report of StratifiedKFold

查看:132
本文介绍了StratifiedKFold的混淆矩阵和分类报告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用StratifiedKFold检查分类器的性能.我有两个课程,我试图建立Logistic回归分类器.这是我的代码

I am using StratifiedKFold to checking the performance of my classifier. I have two classes and I trying to build Logistic Regression classier. Here is my code

skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
for train_index, test_index in skf.split(x, y):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]

    tfidf = TfidfVectorizer()
    x_train = tfidf.fit_transform(x_train)
    x_test = tfidf.transform(x_test)

    clf =  LogisticRegression(class_weight='balanced')
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    score = accuracy_score(y_test, y_pred)
    r.append(score)
    print(score)

print(np.mean(r))

我只能打印性能得分,但是我不知道如何打印混淆矩阵和分类报告.如果我只在循环中添加打印语句,

I could just print the score of the performance but I couldn't figure out how to print the confusion matrix and classification report.If I just add print statement inside the loop,

print(confusion_matrix(y_test, y_pred))

它将打印10次,但我要报告和分类器最终性能的矩阵.

it will print it 10 times, but I want to report and a matrix of the final performance of the classifier.

有关如何计算矩阵和报告的任何帮助.谢谢

Any help about how to calculation the matrix and the report. Thanks

推荐答案

交叉验证用于评估数据集不同分割中特定模型或超参数的性能.最后,您本身没有最终的表现,您拥有每个分组的个人表现以及各个分组的汇总表现.您可能会分别使用tn,fn,fp,tp来创建汇总精度,查全率,灵敏度等...,但是您也可以仅将预定义函数用于sklearn中的那些指标,并在最后汇总它们.

Cross validation is used to asses the performance of particular models or hyperparameters across different splits of a dataset. At the end you don't have a final performance per se, you have the individual performance of each split and the aggregated performance across splits. You could potentially use the tn, fn, fp, tp for each to create an aggregated precision, recall, sensitivity, etc... but then you could also just use the predefined functions for those metrics in sklearn and aggregate them at the end.

例如

skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
accs, precs, recs = [], [], []
for train_index, test_index in skf.split(x, y):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]

    tfidf = TfidfVectorizer()
    x_train = tfidf.fit_transform(x_train)
    x_test = tfidf.transform(x_test)

    clf =  LogisticRegression(class_weight='balanced')
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)
    rec = recall_score(y_test, y_pred)
    accs.append(acc)
    precs.append(prec)
    recs.append(rec)
    print(f'Accuracy: {acc}, Precision: {prec}, Recall: {rec}')

print(f'Mean Accuracy: {np.mean(accs)}, Mean Precision: {np.mean(precs)}, Mean Recall: {np.mean(recs)}')

这篇关于StratifiedKFold的混淆矩阵和分类报告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆