解决类别不平衡:扩大对损失和 sgd 的贡献 [英] Tackling Class Imbalance: scaling contribution to loss and sgd

查看:21
本文介绍了解决类别不平衡:扩大对损失和 sgd 的贡献的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(已添加对此问题的更新.)

我是比利时根特大学的研究生;我的研究是关于深度卷积神经网络的情感识别.我正在使用 Caffe 框架来实现 CNN.

I am a graduate student at the university of Ghent, Belgium; my research is about emotion recognition with deep convolutional neural networks. I'm using the Caffe framework to implement the CNNs.

最近我遇到了一个关于类不平衡的问题.我正在使用 9216 个训练样本,大约.5% 被标记为正 (1),其余样本被标记为负 (0).

Recently I've run into a problem concerning class imbalance. I'm using 9216 training samples, approx. 5% are labeled positively (1), the remaining samples are labeled negatively (0).

我正在使用 SigmoidCrossEntropyLoss 层来计算损失.在训练时,即使经过几个 epoch,损失也会减少并且准确率非常高.这是由于不平衡造成的:网络总是简单地预测为负 (0).(准确率和召回率都为零,支持这个说法)

I'm using the SigmoidCrossEntropyLoss layer to calculate the loss. When training, the loss decreases and the accuracy is extremely high after even a few epochs. This is due to the imbalance: the network simply always predicts negative (0). (Precision and recall are both zero, backing this claim)

为了解决这个问题,我想根据预测-真值组合来衡量对损失的贡献(严厉惩罚假阴性).我的导师/教练还建议我通过随机梯度下降 (sgd) 在反向传播时使用比例因子:该因子将与批次中的不平衡相关.仅包含负样本的批次根本不会更新权重.

To solve this problem, I would like to scale the contribution to the loss depending on the prediction-truth combination (punish false negatives severely). My mentor/coach has also advised me to use a scale factor when backpropagating through stochastic gradient descent (sgd): the factor would be correlated to the imbalance in the batch. A batch containing only negative samples would not update the weights at all.

我只向 Caffe 添加了一个定制层:报告其他指标,例如准确率和召回率.我对 Caffe 代码的经验有限,但我有很多编写 C++ 代码的专业知识.

谁能帮助我或为我指明如何调整 SigmoidCrossEntropyLoss 的正确方向Sigmoid 层以适应以下变化:

Could anyone help me or point me in the right direction on how to adjust the SigmoidCrossEntropyLoss and Sigmoid layers to accomodate the following changes:

  1. 根据预测-真值组合(真阳性、假阳性、真阴性、假阴性)调整样本对总损失的贡献.
  2. 根据批次中的不平衡(负数与正数)缩放由随机梯度下降执行的权重更新.

提前致谢!

我已经按照 InfogainLossLayer="https://stackoverflow.com/a/30497907/1714410">谢.我还添加了另一个自定义层,它根据当前批次的不平衡情况构建信息增益矩阵 H.

I have incorporated the InfogainLossLayer as suggested by Shai. I've also added another custom layer that builds the infogain matrix H based on the imbalance in the current batch.

目前,矩阵配置如下:

H(i, j) = 0          if i != j
H(i, j) = 1 - f(i)   if i == j (with f(i) = the frequency of class i in the batch)

我计划在未来为矩阵尝试不同的配置.

我已经在 10:1 的不平衡情况下对此进行了测试.结果表明,网络现在正在学习有用的东西:(30 epochs 后的结果)

I have tested this on a 10:1 imbalance. The results have shown that the network is learning useful things now: (results after 30 epochs)

  • 准确度约为.~70%(低于 ~97%);
  • 精度约为.~20%(从 0% 上升);
  • 召回约为.~60%(从 0% 上升).

这些数字在大约 20 个 epoch 时达到,之后没有显着变化.

These numbers were reached at around 20 epochs and didn't change significantly after that.

!!上述结果仅仅是概念证明,它们是通过在 10:1 不平衡数据集上训练一个简单网络而获得的.!!

推荐答案

为什么不使用 InfogainLoss 层来补偿训练集中的不平衡?

Why don't you use the InfogainLoss layer to compensate for the imbalance in your training set?

Infogain 损失是使用权重矩阵 H 定义的(在您的情况下为 2×2)其条目的含义是

The Infogain loss is defined using a weight matrix H (in your case 2-by-2) The meaning of its entries are

[cost of predicting 1 when gt is 0,    cost of predicting 0 when gt is 0
 cost of predicting 1 when gt is 1,    cost of predicting 0 when gt is 1]

因此,您可以设置H的条目来反映预测0或1的错误之间的差异.

So, you can set the entries of H to reflect the difference between errors in predicting 0 or 1.

您可以在此线程H>.

You can find how to define matrix H for caffe in this thread.

关于样本权重,您可以找到这篇文章有趣:它展示了如何修改 SoftmaxWithLoss 层以考虑样本权重.

Regarding sample weights, you may find this post interesting: it shows how to modify the SoftmaxWithLoss layer to take into account sample weights.

最近,Tsung-Yi Lin、Priya Goyal、Ross 提出了对交叉熵损失的修改Girshick、Kaiming He、Piotr Dollár 用于密集物体检测的焦点损失,(ICCV 2017).
焦点损失背后的想法是根据预测该示例的相对难度(而不是基于类大小等)为每个示例分配不同的权重.从我尝试这种损失的短暂时间来看,它感觉优于具有类大小权重的 InfogainLoss".

Recently, a modification to cross-entropy loss was proposed by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár Focal Loss for Dense Object Detection, (ICCV 2017).
The idea behind focal-loss is to assign different weight for each example based on the relative difficulty of predicting this example (rather based on class size etc.). From the brief time I got to experiment with this loss, it feels superior to "InfogainLoss" with class-size weights.

这篇关于解决类别不平衡:扩大对损失和 sgd 的贡献的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆