解决班级不平衡问题:扩大对损失和sgd的贡献 [英] Tackling Class Imbalance: scaling contribution to loss and sgd

查看:90
本文介绍了解决班级不平衡问题:扩大对损失和sgd的贡献的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(此问题的更新已添加.)

我在根特,比利时的大学研究生;我的研究是关于深度卷积神经网络的情感识别.我使用的来自Caffe 框架来实现细胞神经网络.

I am a graduate student at the university of Ghent, Belgium; my research is about emotion recognition with deep convolutional neural networks. I'm using the Caffe framework to implement the CNNs.

最近,我遇到了有关班级失衡的问题.我使用的是9216个训练样本. 5%的阳性标记的(1),将剩余的样品标记为负(0).

Recently I've run into a problem concerning class imbalance. I'm using 9216 training samples, approx. 5% are labeled positively (1), the remaining samples are labeled negatively (0).

我正在使用 SigmoidCrossEntropyLoss 层来计算损失.训练时,即使经过几个纪元,损失也会减少,并且准确性非常高.这是由于不平衡:网络简单地总是预测阴性(0). (精确度和召回率均为零,支持此主张)

I'm using the SigmoidCrossEntropyLoss layer to calculate the loss. When training, the loss decreases and the accuracy is extremely high after even a few epochs. This is due to the imbalance: the network simply always predicts negative (0). (Precision and recall are both zero, backing this claim)

要解决这个问题,我想为扩展到取决于预测实况组合损失的贡献(惩罚假阴性严重).我的导师/教练也建议我的 backpropagating时通过随机梯度下降(SGD)使用的比例因子:因子将被关联到该批次中的不平衡.仅包含阴性样品批次不会在所有更新权重.

To solve this problem, I would like to scale the contribution to the loss depending on the prediction-truth combination (punish false negatives severely). My mentor/coach has also advised me to use a scale factor when backpropagating through stochastic gradient descent (sgd): the factor would be correlated to the imbalance in the batch. A batch containing only negative samples would not update the weights at all.

我仅在Caffe中添加了一个定制层:报告其他指标,例如精度和召回率.我对Caffe代码的经验有限,但是我在编写C ++代码方面有很多专业知识.

任何人都可以帮助我,或点我就如何调整正确的方向 SigmoidCrossEntropyLoss Sigmoid 层以适应以下更改:

Could anyone help me or point me in the right direction on how to adjust the SigmoidCrossEntropyLoss and Sigmoid layers to accomodate the following changes:

  1. 调整的样品,以根据所述预测真值组合(真阳性,假阳性,真阴性,假阴性)的总损耗的贡献.
  2. 根据批次中的不平衡量(负数与正数)来衡量通过随机梯度下降执行的权重更新.

提前谢谢!

我已经并入在 InfogainLossLayer 如通过 即可.我还添加该构建infogain矩阵基于在当前批次的不平衡.

I have incorporated the InfogainLossLayer as suggested by Shai. I've also added another custom layer that builds the infogain matrix H based on the imbalance in the current batch.

目前,矩阵被构造成如下:

Currently, the matrix is configured as follows:

H(i, j) = 0          if i != j
H(i, j) = 1 - f(i)   if i == j (with f(i) = the frequency of class i in the batch)

我正计划在将来对矩阵的不同配置进行实验.

我已经以10:1的不平衡测试了这一点.结果表明,网络正在学习有用的东西:(30个纪元后的结果)

I have tested this on a 10:1 imbalance. The results have shown that the network is learning useful things now: (results after 30 epochs)

  • 精度是大约〜70%(低于〜97%);
  • 精度是大约〜20%(从0%上升);
  • 回想约为〜60%(从0%上升).

这些数字是在约20时期达到和没有后显著变化.

These numbers were reached at around 20 epochs and didn't change significantly after that.

!! 1不平衡数据集:上述的结果仅仅是概念的证明,他们被在10训练一个简单的网络获得的. !!

推荐答案

你为什么不使用的 <强> InfogainLoss 层为在训练集合中的失衡补偿?

Why don't you use the InfogainLoss layer to compensate for the imbalance in your training set?

在Infogain损失是使用权重确定的基质(你的情况2乘2)其项的含义是

The Infogain loss is defined using a weight matrix H (in your case 2-by-2) The meaning of its entries are

[cost of predicting 1 when gt is 0,    cost of predicting 0 when gt is 0
 cost of predicting 1 when gt is 1,    cost of predicting 0 when gt is 1]

因此,可以设置的中的条目,以反映在预测0或1误差之间的差.

So, you can set the entries of H to reflect the difference between errors in predicting 0 or 1.

您可以在此线程中找到如何为caffe定义矩阵H.

You can find how to define matrix H for caffe in this thread.

关于样品重量,可能会发现此篇有趣:它示出了如何修改的 SoftmaxWithLoss 层来顾及样本权重

Regarding sample weights, you may find this post interesting: it shows how to modify the SoftmaxWithLoss layer to take into account sample weights.

最近,交叉熵损失的变形例,提出通过 崇易林普里亚戈亚尔,罗斯·吉尔希克(Ross Girshick),何凯明(Kaiming He),皮奥特·多拉尔(PiotrDollár) 用于密集物体检测的焦点损失,(ICCV 2017).
后面焦损失的想法是提供一种用于基于预测本实施例的相对难度每个实施例分配不同的重量(而基于类大小等).从我开始尝试这种损失的短暂时间起,它感觉优于使用类大小的权重的"InfogainLoss".

Recently, a modification to cross-entropy loss was proposed by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár Focal Loss for Dense Object Detection, (ICCV 2017).
The idea behind focal-loss is to assign different weight for each example based on the relative difficulty of predicting this example (rather based on class size etc.). From the brief time I got to experiment with this loss, it feels superior to "InfogainLoss" with class-size weights.

这篇关于解决班级不平衡问题:扩大对损失和sgd的贡献的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆