scikit-learn:随机森林 class_weight 和 sample_weight 参数 [英] scikit-learn: Random forest class_weight and sample_weight parameters

查看：295 发布时间：2021/7/16 19:59:46 python scikit-learn

本文介绍了scikit-learn:随机森林 class_weight 和 sample_weight 参数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个类别不平衡问题，并且一直在使用 scikit-learn (>= 0.16) 中的实现来试验加权随机森林.

I have a class imbalance problem and been experimenting with a weighted Random Forest using the implementation in scikit-learn (>= 0.16).

我注意到该实现在树构造函数中采用了 class_weight 参数，在 fit 方法中采用了 sample_weight 参数来帮助解决类不平衡问题.这两者似乎相乘以决定最终权重.

I have noticed that the implementation takes a class_weight parameter in the tree constructor and sample_weight parameter in the fit method to help solve class imbalance. Those two seem to be multiplied though to decide a final weight.

我无法理解以下内容:

在树构建/训练/预测的哪个阶段使用这些权重?我看过一些关于加权树的论文，但我不确定 scikit 实现了什么.
class_weight 和 sample_weight 到底有什么区别?

推荐答案

RandomForests 建立在 Trees 之上，Trees 有很好的文档记录.检查 Trees 如何使用样本权重:

RandomForests are built on Trees, which are very well documented. Check how Trees use the sample weighting:

决策树用户指南 - 准确说明所使用的算法
决策树 API - 解释了树如何使用 sample_weight(对于随机森林，正如您所确定的，它是 class_weight 和 sample_weight 的乘积).

User guide on decision trees - tells exactly what algorithm is used
Decision tree API - explains how sample_weight is used by trees (which for random forests, as you have determined, is the product of class_weight and sample_weight).

至于 class_weight 和 sample_weight 之间的区别:很多可以简单地由它们的数据类型的性质决定.sample_weight 是长度为 n_samples 的一维数组，为每个用于训练的示例分配一个明确的权重.class_weight 是每个类的字典到该类的统一权重(例如，{1:.9, 2:.5, 3:.01})，或者是一个字符串，告诉 sklearn 如何自动确定这个字典.

As for the difference between class_weight and sample_weight: much can be determined simply by the nature of their datatypes. sample_weight is 1D array of length n_samples, assigning an explicit weight to each example used for training. class_weight is either a dictionary of each class to a uniform weight for that class (e.g., {1:.9, 2:.5, 3:.01}), or is a string telling sklearn how to automatically determine this dictionary.

因此，给定示例的训练权重是其明确命名为 sample_weight(如果未提供 sample_weight，则为 1)的乘积，它是 class_weight(如果没有提供 class_weight，则为 1).

So the training weight for a given example is the product of it's explicitly named sample_weight (or 1 if sample_weight is not provided), and it's class_weight (or 1 if class_weight is not provided).

这篇关于scikit-learn:随机森林 class_weight 和 sample_weight 参数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

scikit-learn:随机森林 class_weight 和 sample_weight 参数 [英] scikit-learn: Random forest class_weight and sample_weight parameters

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

scikit-learn:随机森林 class_weight 和 sample_weight 参数 [英] scikit-learn: Random forest class_weight and sample_weight parameters

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭