Keras - fit_generator 中的 class_weight 与 sample_weights [英] Keras - class_weight vs sample_weights in the fit_generator

查看:59
本文介绍了Keras - fit_generator 中的 class_weight 与 sample_weights的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Keras(使用 TensorFlow 作为后端)中,我正在构建一个模型,该模型正在处理具有高度不平衡类(标签)的庞大数据集.为了能够运行训练过程,我创建了一个生成器,它将数据块提供给 fit_generator.

In Keras (using TensorFlow as a backend) I am building a model which is working with a huge dataset that is having highly imbalanced classes (labels). To be able to run the training process, I created a generator which feeds chunks of data to the fit_generator.

根据 fit_generator 的文档,生成器的输出可以是元组 (inputs, targets) 或元组 (inputs, targets, sample_weights).考虑到这一点,这里有几个问题:

According to the documentation for the fit_generator, the output of the generator can either be the tuple (inputs, targets) or the tuple (inputs, targets, sample_weights). Having that in mind, here are a few questions:

  1. 我的理解是class_weight 考虑整个数据集的所有类的权重,而sample_weights 考虑每个单独块的所有类的权重由生成器创建.那是对的吗?如果没有,有人可以详细说明一下吗?
  2. 是否有必要将 class_weight 提供给 fit_generator,然后将 sample_weights 作为每个块的输出?如果是,那为什么?如果不是,那么哪个更好?
  3. 如果我应该为每个块提供 sample_weights ,如果特定块中缺少某些类,我该如何映射权重?让我举个例子吧.在我的整个数据集中,我有 7 个可能的类(标签).因为这些类是高度不平衡的,当我创建更小的数据块作为 fit_generator 的输出时,特定块中缺少一些类.我应该如何为这些块创建 sample_weights?
  1. My understanding is that the class_weight regards the weights of all classes for the entire dataset whereas the sample_weights regards the weights of all classes for each individual chunk created by the generator. Is that correct? If not, can someone elaborate on the matter?
  2. Is it necessary to give both the class_weight to the fit_generator and then the sample_weights as an output for each chunk? If yes, then why? If not then which one is better to give?
  3. If I should give the sample_weights for each chunk, how do I map the weights if some of the classes are missing from a specific chunk? Let me give an example. In my overall dataset, I have 7 possible classes (labels). Because these classes are highly imbalanced, when I create smaller chunks of data as an output from the fit_generator, some of the classes are missing from the specific chunk. How should I create the sample_weights for these chunks?

推荐答案

我的理解是 class_weight 是所有的权重整个数据集的类,而 sample_weights 是关于由创建的每个单独块的所有类的权重发电机.那是对的吗?如果没有,有人可以详细说明重要吗?

My understanding is that the class_weight regards the weights of all classes for the entire dataset whereas the sample_weights regards the weights of all classes for each individual chunk created by the generator. Is that correct? If not, can someone elaborate on the matter?

class_weight 在目标函数的计算中影响每个类的相对权重.sample_weights,顾名思义,允许进一步控制属于同一类的样本的相对权重.

class_weight affects the relative weight of each class in the calculation of the objective function. sample_weights, as the name suggests, allows further control of the relative weight of samples that belong to the same class.

是否有必要同时将 class_weight 赋予 fit_generator 和那么 sample_weights 作为每个块的输出?如果是,那为什么?如果不是,那么哪个更好?

Is it necessary to give both the class_weight to the fit_generator and then the sample_weights as an output for each chunk? If yes, then why? If not then which one is better to give?

这取决于您的应用程序.在对高度倾斜的数据集进行训练时,类权重很有用;例如,用于检测欺诈交易的分类器.当您对批次中的样本没有同等信心时,样本权重很有用.一个常见的例子是对具有可变不确定性的测量进行回归.

It depends on your application. Class weights are useful when training on highly skewed data sets; for example, a classifier to detect fraudulent transactions. Sample weights are useful when you don't have equal confidence in the samples in your batch. A common example is performing regression on measurements with variable uncertainty.

如果我应该为每个块提供 sample_weights,我该如何映射如果特定块中缺少某些类,则权重?让我举个例子.在我的整体数据集中,我有 7 个可能的类(标签).因为这些类是高度不平衡的,当我创建较小的数据块作为 fit_generator 的输出,其中一些特定块中缺少类.我应该如何创建这些块的样本权重?

If I should give the sample_weights for each chunk, how do I map the weights if some of the classes are missing from a specific chunk? Let me give an example. In my overall dataset, I have 7 possible classes (labels). Because these classes are highly imbalanced, when I create smaller chunks of data as an output from the fit_generator, some of the classes are missing from the specific chunk. How should I create the sample_weights for these chunks?

这不是问题.sample_weights 在每个样本的基础上定义并且独立于类.为此,文档 指出(inputs, targets, sample_weights) 应该是相同的长度.

This is not an issue. sample_weights is defined on a per-sample basis and is independent from the class. For this reason, the documentation states that (inputs, targets, sample_weights) should be the same length.

函数 _weighted_masked_objectiveengine/training.py 中的 code> 有一个正在应用 sample_weights 的示例.

The function _weighted_masked_objective in engine/training.py has an example of sample_weights are being applied.

这篇关于Keras - fit_generator 中的 class_weight 与 sample_weights的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆