功能选择之前或之后进行采样 [英] Sampling before or after feature selection

查看:84
本文介绍了功能选择之前或之后进行采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对特征选择,采样和交叉验证的顺序感到困惑,我的数据集有468行和23000列,其中269个属于I类,199个属于II类,这些数据在拆分进行训练和测试时在火车中具有[215 I类和159 II类] [在测试中为54 I类和40 II类].由于样本数量较少,我不得不对火车数据应用SMOTE过采样以减少偏差.还是我应该在此处应用欠采样,这会导致数据丢失,从而导致样本量大大减少.I)首先应用过采样,然后进行特征选择技术,然后进行交叉验证这样做:在交叉验证期间,由于过度采样而导致的行重复,可能会引起偏差II)首先应用特征选择技术,然后进行过度采样,然后进行交叉验证,这将产生与上述相同的偏差.III)首先应用特征选择技术,然后在10倍交叉验证中对9倍数据进行采样.IV)从交叉验证开始,在每次迭代中执行特征选择,然后对选定的特征数据执行过采样.V)从交叉验证开始,在每次迭代中对9折数据进行采样,并对9折数据进行特征选择

I am confused on the order of feature selection, sampling and cross validation, My dataset has 468 rows and 23000 columns, out of which 269 belong to class I and 199 belong to class II , The data when split to train and test has [215 class I and 159 class II in train ][54 class I and 40 class II in test].Due to less number of samples I had to apply SMOTE oversampling on the train data to reduce bias. Or should I apply Under Sampling here which leads to data loss resulting in much smaller samples. I) Apply over sampling first and then feature selection technique and then cross validation On doing so: During Cross validation there might be bias induced due to repetition of rows due to over sampling II) Apply Feature selection technique first and do over sampling and then do cross validation, which will induce the same bias as above. III) Apply feature selection techniques first and inside a 10-fold cross validation perform sampling on the 9 folds’ data. IV) Start with cross validation and inside each iteration perform feature selection and then perform over sampling on the selected feature data. V) Start with cross validation and inside each iteration perform sampling on the 9 fold data and perform feature selection on that 9 fold sampled data

哪种技术是正确的方法,也可以提供良好的结果.

Which techniques is the correct methods and also provides good results.

推荐答案

SMOTE 论文描述了应该在采样之前进行特征选择.

The SMOTE paper describes that the feature selection should be performed before sampling.

这篇关于功能选择之前或之后进行采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆