当数据集不平衡时,用于多类分类的最佳损失函数? [英] Best loss function for multi-class classification when the dataset is imbalance?

查看:37
本文介绍了当数据集不平衡时,用于多类分类的最佳损失函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用交叉熵损失函数,但由于数据集不平衡,因此性能不佳.

有更好的功能丧失吗?

解决方案

这是一个非常广泛的主题,但是恕我直言,您应该尝试

蓝色曲线是规则的交叉熵损失:一方面,即使对于分类良好的示例,其损失和梯度也可以忽略不计;另一方面,对于错误分类的示例,其梯度较弱.相反,对于分类良好的示例,焦点损失(所有其他曲线)具有较小的损耗和较弱的梯度;对于分类错误的示例,其具有较强的梯度.

I'm currently using the Cross Entropy Loss function but with the imbalance data-set the performance is not great.

Is there better lost function?

解决方案

It's a very broad subject, but IMHO, you should try focal loss: It was introduced by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He and Piotr Dollar to handle imbalance prediction in object detection. Since introduced it was also used in the context of segmentation.
The idea of the focal loss is to reduce both loss and gradient for correct (or almost correct) prediction while emphasizing the gradient of errors.

As you can see in the graph:

Blue curve is the regular cross entropy loss: it has on the one hand non-negligible loss and gradient even for well classified examples, and on the other hand it has weaker gradient for the erroneously classified examples.
In contrast, focal loss (all other curves) has smaller loss and weaker gradient for the well classified examples and stronger gradients for the erroneously classified examples.

这篇关于当数据集不平衡时,用于多类分类的最佳损失函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆