Keras-fit_generator中的class_weight vs sample_weights [英] Keras - class_weight vs sample_weights in the fit_generator
问题描述
在Keras中(使用TensorFlow作为后端),我正在构建一个模型,该模型正在处理具有高度不平衡类(标签)的巨大数据集.为了能够运行训练过程,我创建了一个生成器,该生成器将大块数据馈送到fit_generator
.
In Keras (using TensorFlow as a backend) I am building a model which is working with a huge dataset that is having highly imbalanced classes (labels). To be able to run the training process, I created a generator which feeds chunks of data to the fit_generator
.
根据 fit_generator 的文档,生成器的输出可以是元组(inputs, targets)
或元组(inputs, targets, sample_weights)
.考虑到这一点,这里有几个问题:
According to the documentation for the fit_generator, the output of the generator can either be the tuple (inputs, targets)
or the tuple (inputs, targets, sample_weights)
. Having that in mind, here are a few questions:
- 我的理解是
class_weight
考虑整个数据集所有类的权重,而sample_weights
考虑每个单独块的所有类的权重 由生成器创建.那是对的吗?如果没有,有人可以详细说明吗? - 是否有必要将
class_weight
分别赋予fit_generator
,然后再赋予sample_weights
作为每个块的输出?如果是,那为什么呢?如果不是,那么哪个最好给? - 如果应该为每个块提供
sample_weights
,如果特定块中缺少某些类,如何映射权重?让我举个例子吧.在我的整体数据集中,我有7种可能的类(标签).因为这些类高度不平衡,所以当我创建较小的数据块作为fit_generator
的输出时,某些块会丢失特定的块.我应该如何为这些块创建sample_weights
?
- My understanding is that
the
class_weight
regards the weights of all classes for the entire dataset whereas thesample_weights
regards the weights of all classes for each individual chunk created by the generator. Is that correct? If not, can someone elaborate on the matter? - Is it necessary to give both the
class_weight
to thefit_generator
and then thesample_weights
as an output for each chunk? If yes, then why? If not then which one is better to give? - If I should give the
sample_weights
for each chunk, how do I map the weights if some of the classes are missing from a specific chunk? Let me give an example. In my overall dataset, I have 7 possible classes (labels). Because these classes are highly imbalanced, when I create smaller chunks of data as an output from thefit_generator
, some of the classes are missing from the specific chunk. How should I create thesample_weights
for these chunks?
推荐答案
我的理解是,class_weight考虑了所有对象的权重 整个数据集的类,而sample_weights考虑 所有类别的权重由 发电机.那是对的吗?如果没有,有人可以详细说明 有关系吗?
My understanding is that the class_weight regards the weights of all classes for the entire dataset whereas the sample_weights regards the weights of all classes for each individual chunk created by the generator. Is that correct? If not, can someone elaborate on the matter?
class_weight
影响目标函数计算中每个类别的相对权重.顾名思义,sample_weights
允许进一步控制属于同一类别的样品 的相对权重.
class_weight
affects the relative weight of each class in the calculation of the objective function. sample_weights
, as the name suggests, allows further control of the relative weight of samples that belong to the same class.
是否必须同时将class_weight赋予fit_generator和 那么sample_weights作为每个块的输出?如果是,那为什么呢? 如果不是,那么哪个更适合?
Is it necessary to give both the class_weight to the fit_generator and then the sample_weights as an output for each chunk? If yes, then why? If not then which one is better to give?
这取决于您的应用程序.在对高度偏斜的数据集进行训练时,班级权重非常有用;例如,用于检测欺诈交易的分类器.当您对批次中的样品没有相同的信心时,样品重量非常有用.一个常见的示例是对具有不确定性的变量进行回归分析.
It depends on your application. Class weights are useful when training on highly skewed data sets; for example, a classifier to detect fraudulent transactions. Sample weights are useful when you don't have equal confidence in the samples in your batch. A common example is performing regression on measurements with variable uncertainty.
如果我应该为每个块都提供sample_weights,我该如何映射 如果某个特定块中缺少某些类,则权重是多少?让 我举个例子.在我的整体数据集中,我有7个可能的类 (标签).因为这些类高度不平衡,所以当我创建 较小的数据块作为fit_generator的输出,其中一些 特定块中缺少类.我应该如何创建 这些块的sample_weights?
If I should give the sample_weights for each chunk, how do I map the weights if some of the classes are missing from a specific chunk? Let me give an example. In my overall dataset, I have 7 possible classes (labels). Because these classes are highly imbalanced, when I create smaller chunks of data as an output from the fit_generator, some of the classes are missing from the specific chunk. How should I create the sample_weights for these chunks?
这不是问题. sample_weights
是在每个样本的基础上定义的,并且与类无关.因此,文档指出(inputs, targets, sample_weights)
的长度应相同.
This is not an issue. sample_weights
is defined on a per-sample basis and is independent from the class. For this reason, the documentation states that (inputs, targets, sample_weights)
should be the same length.
_weighted_masked_objective
>有一个示例sample_weights正在应用.
The function _weighted_masked_objective
in engine/training.py
has an example of sample_weights are being applied.
这篇关于Keras-fit_generator中的class_weight vs sample_weights的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!