Weka上的烟雾和欠采样的组合 [英] combination of smote and undersampling on weka

查看:194
本文介绍了Weka上的烟雾和欠采样的组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据chawla等人的论文(2002年),
平衡数据的最佳性能是将欠采样与SMOTE相结合。

according to paper which written by chawla, et al (2002) the best perfomance of balancing data is combining undersampling with SMOTE.

试图使用欠采样和SMOTE(
)组合我的数据集,但我对欠采样的属性有些困惑。

I’ve tried to combine my dataset using under-sampling and SMOTE, but I am bit confuse about the attribute for under-sampling.

在Weka中,减少多数阶层。
在Resample中有一个属性
biasToUniformClass-是否对统一类使用偏见。值为0会使类分布保持原样,值为1则确保输出数据中的类分布是均匀的。

In weka there is Resample to decrease the majority class. there is a attribute in Resample biasToUniformClass -- Whether to use bias towards a uniform class. A value of 0 leaves the class distribution as-is, a value of 1 ensures the class distribution is uniform in the output data.

我使用值0,而将多数类减少了,少数类也减少了,当我使用值1时,多数类的数据减少了,而少数类中的数据增加了。

I use value 0 and the data in majority class is down so the minority do and when I use value 1, the data in majority decrease but in minority class, the data is up.

我尝试使用值该属性为1,但我不使用smote来增加少数类的实例,因为数据已经平衡并且结果也很好。

I try to use value 1 for that attribute, but I don't using smote to increase the instances of minority class because the data is already balance and the result is good too.

所以,是就像我将SMOTE和欠采样合并在一起一样,还是我仍然必须尝试在该属性中使用值0并执行SMOTE吗?

so, is that the same as I combine the SMOTE and under-sampling or I still have to try with value 0 in that attribute and do the SMOTE ?

推荐答案

有关欠采样的信息,请参见EasyEnsemble算法(由Schubach,Robinson和Valentini开发的Weka实现)。

For undersampling, see the EasyEnsemble algorithm (a Weka implementation was developed by Schubach, Robinson, and Valentini).

EasyEnsemble算法允许您将数据拆分为一定数量的平衡分区。为了达到这种平衡,请将numIterations参数设置为:

The EasyEnsemble algorithm allows you to split the data into a certain number of balanced partitions. To achieve this balance, set the numIterations parameter equal to:

#个多数实例)/(#个少数实例)= numIterations

例如,如果有30个实例,多数类为20个,少数类为10个,则设置numIterations参数等于2(即20个多数实例/ 10个实例等于2个平衡分区)。这两个分区应分别包含20个实例;每个具有相同的10个少数实例,以及与多数类不同的10个实例。

For example, if there are 30 total instances with 20 in the majority class and 10 in the minority class, set the numIterations parameter equal to 2 (i.e., 20 majority instances / 10 instances equals 2 balanced partitions). These 2 partitions should each contain 20 instances; each has the same 10 minority instances along with a different 10 instances from the majority class.

然后,该算法在每个平衡分区上训练分类器,分别将
和在测试时,集合在每个平衡分区上训练的分类器批次以进行预测。

The algorithm then trains classifiers on each of the balanced partitions, and at test time, ensembles the batch of classifiers trained on each of the balanced partitions for prediction.

这篇关于Weka上的烟雾和欠采样的组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆