SMOTE过采样和交叉验证 [英] SMOTE oversampling and cross-validation

查看：491 发布时间：2020/5/4 9:48:07 machine-learning weka text-classification

本文介绍了SMOTE过采样和交叉验证的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理Weka中的二进制分类问题，该问题具有高度不平衡的数据集(一种类别为90％，另一种类别为10％).我首先应用了SMOTE( http: //www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html )到整个数据集以使类别均匀，然后进行10倍对新获得的数据进行交叉验证.我发现(过度?)F1的乐观结果约为90％.

I am working on a binary classification problem in Weka with a highly imbalanced data set (90% in one category and 10% in the other). I first applied SMOTE (http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html) to the entire data set to even out the categories and then performed 10-fold cross-validation over the newly obtained data. I found (overly?) optimistic results with F1 around 90%.

这是由于过采样吗? 对应用了SMOTE的数据执行交叉验证是否是错误的做法? 有什么办法可以解决这个问题?

Is this due to oversampling? Is it bad practice to perform cross-validation on data on which SMOTE is applied? Are there any ways to solve this problem?

SMOTE过采样和交叉验证 [英] SMOTE oversampling and cross-validation

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

SMOTE过采样和交叉验证 [英] SMOTE oversampling and cross-validation

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭