scikit learn 中的样本权重和类权重选项有什么区别? [英] What is the difference between sample weight and class weight options in scikit learn?

查看:37
本文介绍了scikit learn 中的样本权重和类权重选项有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有类不平衡问题,想使用成本敏感学习来解决这个问题.

I have class imbalance problem and want to solve this using cost sensitive learning.

  1. 欠采样和过采样
  2. 为类赋予权重以使用修改后的损失函数

问题

Scikit learn 有 2 个选项,称为类权重和样本权重.样本权重实际上是否在执行选项 2) 和类权重选项 1).选项 2) 是处理类不平衡的推荐方法.

Scikit learn has 2 options called class weights and sample weights. Is sample weight actually doing option 2) and class weight options 1). Is option 2) the the recommended way of handling class imbalance.

推荐答案

是类似的概念,但是使用 sample_weights 可以强制 estimator 关注某些样本,而使用 class_weights 可以强制 estimator 关注某些特定的样本班级.sample_weight=0 或 class_weight=0 基本上意味着 estimator 在学习过程中根本不需要考虑这些样本/类.因此,如果 class_weight = 0 对于这个类,分类器(例如)永远不会预测某个类.如果某些样本权重/类权重大于其他样本/类的样本权重/类权重 - 估计器将首先尝试最小化该样本/类的错误.您可以同时使用用户定义的 sample_weights 和 class_weights.

It's similar concepts, but with sample_weights you can force estimator to pay more attention on some samples, and with class_weights you can force estimator to learn with attention to some particular class. sample_weight=0 or class_weight=0 basically means that estimator doesn't need to take into consideration such samples/classes in learning process at all. Thus classifier (for example) will never predict some class if class_weight = 0 for this class. If some sample_weight/class_weight bigger than sample_weight/class_weight on other samples/classes - estimator will try to minimize error on that samples/classes in the first place. You can use user-defined sample_weights and class_weights simultaneously.

如果您想通过简单的克隆/删除对训练集进行欠采样/过采样 - 这将等于增加/减少相应的 sample_weights/class_weights.

If you want to undersample/oversample your training set with simple cloning/removing - this will be equal to increasing/decreasing of corresponding sample_weights/class_weights.

在更复杂的情况下,您还可以尝试使用SMOTE.

In more complex cases you can also try artificially generate samples, with techniques like SMOTE.

这篇关于scikit learn 中的样本权重和类权重选项有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆