如何在交叉验证和GridSearchCV中实现SMOTE [英] How to implement SMOTE in cross validation and GridSearchCV

查看:323
本文介绍了如何在交叉验证和GridSearchCV中实现SMOTE的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Python比较陌生.您可以帮助我将SMOTE的实施改进到适当的流程吗?我想要的是对每个k倍迭代的训练集应用过采样和欠采样,以便在平衡的数据集上训练模型,并在不平衡的遗漏片段上进行评估.问题是,当我这样做时,无法使用熟悉的sklearn界面进行评估和网格搜索.

I'm relatively new to Python. Can you help me improve my implementation of SMOTE to a proper pipeline? What I want is to apply the over and under sampling on the training set of every k-fold iteration so that the model is trained on a balanced data set and evaluated on the imbalanced left out piece. The problem is that when I do that I cannot use the familiar sklearn interface for evaluation and grid search.

是否可以制作类似于model_selection.RandomizedSearchCV的内容.我对此:

Is it possible to make something similar to model_selection.RandomizedSearchCV. My take on this:

df = pd.read_csv("Imbalanced_data.csv") #Load the data set
X = df.iloc[:,0:64]
X = X.values
y = df.iloc[:,64]
y = y.values
n_splits = 2
n_measures = 2 #Recall and AUC
kf = StratifiedKFold(n_splits=n_splits) #Stratified because we need balanced samples
kf.get_n_splits(X)
clf_rf = RandomForestClassifier(n_estimators=25, random_state=1)
s =(n_splits,n_measures)
scores = np.zeros(s)
for train_index, test_index in kf.split(X,y):
   print("TRAIN:", train_index, "TEST:", test_index)
   X_train, X_test = X[train_index], X[test_index]
   y_train, y_test = y[train_index], y[test_index]
   sm = SMOTE(ratio = 'auto',k_neighbors = 5, n_jobs = -1)
   smote_enn = SMOTEENN(smote = sm)
   x_train_res, y_train_res = smote_enn.fit_sample(X_train, y_train)
   clf_rf.fit(x_train_res, y_train_res)
   y_pred = clf_rf.predict(X_test,y_test)
   scores[test_index,1] = recall_score(y_test, y_pred)
   scores[test_index,2] = auc(y_test, y_pred)

推荐答案

您需要查看管道对象.不平衡学习具有管道其中扩展了scikit-learn管道,以适应scikit-learn的fit_predict(),fit_transform()和预报()方法之外的fit_sample()和sample()方法.

You need to look at the pipeline object. imbalanced-learn has a Pipeline which extends the scikit-learn Pipeline, to adapt for the fit_sample() and sample() methods in addition to fit_predict(), fit_transform() and predict() methods of scikit-learn.

在这里看看这个例子:

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆