您如何处理 SVM 中的数据不平衡? [英] How do you handle data imbalance in SVM?

查看:26
本文介绍了您如何处理 SVM 中的数据不平衡?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在 lrge 训练集上训练 SVM,并且类变量是 True 或 False,那么与训练集中的 False 值数量相比,True 值很少会影响训练模型/结果吗?他们应该平等吗?如果我的训练集的 True 和 False 分布不均,我该如何处理以尽可能高效地完成训练?

If I am training a SVM on a lrge training set and if the class variable is either True or False, would having very few True values compared to he number of False values in the training set affect the training model/results? Should they be equal? If my training set doesn't have an equal distribution of True and False, how do I take care of this such that my training is done as efficiently as possible?

推荐答案

有不平衡的数据很好,因为 SVM 应该能够对与不太可能的实例相关的错误分类错误分配更大的惩罚(例如True"您的情况),而不是分配相等的错误权重,这会导致将所有内容分配给大多数人的不良分类器.但是,使用平衡的数据,您可能会获得更好的结果.这完全取决于您的数据,真的.

It's fine to have imbalanced data, because the SVM should be able to assign a greater penalty to misclassification errors related with the less likely instance (e.g. "True" in your case), rather than assign equal error weight which results in the undesirable classifier that assigns everything to the majority. However, you'll probably get better results with balanced data. It all depends on your data, really.

您可以人为地倾斜数据以获得更平衡的数据.为什么不看看这篇论文:http://pages.stern.nyu.edu/~fprovost/Papers/skew.PDF.

You could skew the data artificially to get more balanced data. Why don't you check this paper: http://pages.stern.nyu.edu/~fprovost/Papers/skew.PDF.

这篇关于您如何处理 SVM 中的数据不平衡?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆