数据集不平衡时多类分类的最佳损失函数? [英] Best loss function for multi-class classification when the dataset is imbalance?

查看:27
本文介绍了数据集不平衡时多类分类的最佳损失函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用交叉熵损失函数,但对于不平衡数据集,性能并不好.

有没有更好的丢失函数?

解决方案

这是一个非常广泛的主题,但恕我直言,你应该尝试

蓝色曲线是规则的交叉熵损失:一方面,即使对于分类良好的示例,它也具有不可忽略的损失和梯度,另一方面,对于错误分类的示例,它具有较弱的梯度.
相比之下,焦点损失(所有其他曲线)对于分类良好的示例具有较小的损失和较弱的梯度,对于错误分类的示例具有较强的梯度.

I'm currently using the Cross Entropy Loss function but with the imbalance data-set the performance is not great.

Is there better lost function?

解决方案

It's a very broad subject, but IMHO, you should try focal loss: It was introduced by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollar to handle imbalance prediction in object detection. Since introduced it was also used in the context of segmentation.
The idea of the focal loss is to reduce both loss and gradient for correct (or almost correct) prediction while emphasizing the gradient of errors.

As you can see in the graph:

Blue curve is the regular cross entropy loss: it has on the one hand non-negligible loss and gradient even for well classified examples, and on the other hand it has weaker gradient for the erroneously classified examples.
In contrast, focal loss (all other curves) has smaller loss and weaker gradient for the well classified examples and stronger gradients for the erroneously classified examples.

这篇关于数据集不平衡时多类分类的最佳损失函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆