scikit Learn中的样本权重和班级权重选项之间有什么区别? [英] What is the difference between sample weight and class weight options in scikit learn?

查看:186
本文介绍了scikit Learn中的样本权重和班级权重选项之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到班级不平衡的问题,并想通过成本敏感的学习来解决这个问题.

I have class imbalance problem and want to solve this using cost sensitive learning.

  1. 样本不足和样本过量
  2. 赋予权重以使用修改后的损失函数

问题

Scikit learning具有2个选项,分别称为类权重"和样本权重".样本权重实际上是在执行选项2),还是在类别权重选项1).选项2)是处理类不平衡的推荐方法.

Scikit learn has 2 options called class weights and sample weights. Is sample weight actually doing option 2) and class weight options 1). Is option 2) the the recommended way of handling class imbalance.

推荐答案

这是相似的概念,但是使用sample_weights可以强制估计器对某些样本给予更多的关注,而使用class_weights可以强制估计器注意某些特定的东西.班级. sample_weight = 0或class_weight = 0基本上意味着,估计器在学习过程中根本不需要考虑此类样本/类.因此,例如,如果分类器的class_weight = 0,则分类器将永远不会预测某个分类.如果某些sample_weight/class_weight大于其他样本/类的sample_weight/class_weight-估计器将首先尝试最小化该样本/类的错误.您可以同时使用用户定义的sample_weights和class_weights.

It's similar concepts, but with sample_weights you can force estimator to pay more attention on some samples, and with class_weights you can force estimator to learn with attention to some particular class. sample_weight=0 or class_weight=0 basically means that estimator doesn't need to take into consideration such samples/classes in learning process at all. Thus classifier (for example) will never predict some class if class_weight = 0 for this class. If some sample_weight/class_weight bigger than sample_weight/class_weight on other samples/classes - estimator will try to minimize error on that samples/classes in the first place. You can use user-defined sample_weights and class_weights simultaneously.

如果您想通过简单的克隆/删除来对训练集进行欠采样/过采样-等于增加/减少相应的sample_weights/class_weights.

If you want to undersample/oversample your training set with simple cloning/removing - this will be equal to increasing/decreasing of corresponding sample_weights/class_weights.

在更复杂的情况下,您还可以尝试使用

In more complex cases you can also try artificially generate samples, with techniques like SMOTE.

这篇关于scikit Learn中的样本权重和班级权重选项之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆