如何在交叉验证期间获取实例索引 [英] How to get indices of instances during cross-validation

查看:50
本文介绍了如何在交叉验证期间获取实例索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个二元分类.我可以知道如何在进行 K 折交叉验证时提取训练数据框的错误分类或分类实例的真实索引吗?我没有找到这个问题的答案 此处.

I am doing a binary classification. May I know how to extract the real indexes of the misclassified or classified instances of the training data frame while doing K fold cross-validation? I found no answer to this question here.

我按照所述获得了折叠中的值 这里:

I got the values in folds as described here:

skf=StratifiedKFold(n_splits=10,random_state=111,shuffle=False)
cv_results = cross_val_score(model, X_train, y_train, cv=skf, scoring='roc_auc')
fold_pred = [pred[j] for i, j in skf.split(X_train,y_train)]
fold_pred

是否有任何方法可以获取错误分类(或已分类)的索引?所以输出是一个数据框,在进行交叉验证时只有错误分类(或分类)的实例.

Is there any method to get index of misclassified (or classified ones)? So the output is a dataframe that only has misclassified(or classified) instances while doing cross validation.

所需的输出:具有实际索引的数据帧中的错误分类实例.

Desired output: Missclassified instances in the dataframe with real indices.

     col1 col2 col3 col4  target
13    0    1    0    0    0
14    0    1    0    0    0
18    0    1    0    0    1
22    0    1    0    0    0

其中输入有 100 个实例,其中 4 个在做 CV 时被错误分类(索引号 13、14、18 和 22)

where input has 100 instances, 4 are misclassified (index number 13,14,18 and 22) while doing CV

推荐答案

cross_val_predict 您已经有了预测.这是对预测与真实标签不同的数据框进行子集化的问题,例如:

From cross_val_predict you already have the predictions. It's a matter of subsetting your data frame where the predictions are not the same as your true label, for example:

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_predict, StratifiedKFold 
from sklearn.datasets import load_breast_cancer
import pandas as pd

data = load_breast_cancer()
df = pd.DataFrame(data.data[:,:5],columns=data.feature_names[:5])
df['label'] = data.target

rfc = RandomForestClassifier()
skf = StratifiedKFold(n_splits=10,random_state=111,shuffle=True)

pred = cross_val_predict(rfc, df.iloc[:,:5], df['label'], cv=skf)

df[df['label']!=pred]
 
     mean radius  mean texture  ...  mean smoothness  label
3          11.42         20.38  ...          0.14250      0
5          12.45         15.70  ...          0.12780      0
9          12.46         24.04  ...          0.11860      0
22         15.34         14.26  ...          0.10730      0
31         11.84         18.70  ...          0.11090      0

这篇关于如何在交叉验证期间获取实例索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆