TensorFlow:我的(通用)骰子丢失实现有什么问题? [英] TensorFlow: What is wrong with my (generalized) dice loss implementation?

查看:79
本文介绍了TensorFlow:我的(通用)骰子丢失实现有什么问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用TensorFlow 1.12进行基于材料的语义(图像)分割.使用多项式交叉熵损失函数,可以得出不错的结果,尤其是考虑到我正在使用的训练数据稀疏且mIoU为0.44时:

I use TensorFlow 1.12 for semantic (image) segmentation based on materials. With a multinomial cross-entropy loss function, this yields okay-ish results, especially considering the sparse amount of training data I´m working with, with mIoU of 0.44:

但是,当我用骰子损失实现代替它时,网络会预测较小的分段,这与我对其理论的理解相反.我认为它在不平衡的数据集上可以更好地工作,并且在预测较小的类时应该更好:

When I replace this with my dice loss implementation, however, the networks predicts way less smaller segmentations, which is contrary to my understanding of its theory. I thought it´s supposed to work better with imbalanced datasets and should be better at predicting the smaller classes:

一张桌子可以更好地形象地显示出来;如您所见,由于骰子丢失,所以从未预测过更多的小类(因此,不确定的精度).利用交叉熵,可以对所有类别进行至少一些预测:

A table visualizes this better; as you can see, with dice loss a lot more smaller classes are never predicted (hence the undefined precision). With cross-entropy, at least some predictions are made for all classes:

我最初以为这是增加mIoU的网络方式(因为我的理解是骰子损失直接优化了骰子损失).但是,骰子损失的mIoU为0.33,而交叉熵为0.44 mIoU,因此在这方面失败了.我现在想知道我的实现是否正确:

I initially thought that this is the networks way of increasing mIoU (since my understanding is that dice loss optimizes dice loss directly). However, mIoU with dice loss is 0.33 compared to cross entropy´s 0.44 mIoU, so it has failed in that regard. I´m now wondering whether my implementation is correct:

def dice_loss(onehots_true, logits):
    probabilities = tf.nn.softmax(logits)
    #weights = 1.0 / ((tf.reduce_sum(onehots_true, axis=0)**2) + 1e-3)
    #weights = tf.clip_by_value(weights, 1e-17, 1.0 - 1e-7)
    numerator = tf.reduce_sum(onehots_true * probabilities, axis=0)
    #numerator = tf.reduce_sum(weights * numerator)
    denominator = tf.reduce_sum(onehots_true + probabilities, axis=0)
    #denominator = tf.reduce_sum(weights * denominator)
    loss = 1.0 - 2.0 * (numerator + 1) / (denominator + 1)
    return loss

我发现一些实现使用权重,尽管我不确定为什么使用权重,因为mIoU也未加权.无论如何,当我使用举重时,经过几个时期的测试结果令人畏惧,训练都会过早停止,因此我将其注释掉了.

Some implementations I found use weights, though I am not sure why, since mIoU isn´t weighted either. At any rate, training is prematurely stopped after one a few epochs with dreadful test results when I use weights, hence I commented them out.

有人认为我的骰子损失实现有什么问题吗?我非常忠实地遵循了在线示例.

Does anyone see anything wrong with my dice loss implementation? I pretty faithfully followed online examples.

为了加快标记过程,我仅使用平行四边形形状的多边形进行注释,并从较大的数据集中复制了一些注释.这导致每个图像只有几个地面真相分割:

In order to speed up the labeling process, I only annotated with parallelogram shaped polygons, and I copied some annotations from a larger dataset. This resulted in only a couple of ground truth segmentations per image:

(此图像实际上包含的注释比平均注释略多.)

(This image actually contains slightly more annotations than average.)

推荐答案

我将添加公式,以供将来解答的任何人参考.广义骰子损失由下式给出:

I'm going to add the formula for reference to anyone who answers in the future. The generalized dice loss is given by:

图片取自 Sudre等人.

Class通过 l 进行迭代.每个像素位置都通过 n 进行迭代.可以在网络输出中使用softmax或sigmoid来生成概率 p_ln .

Class is iterated by l. Each pixel location is iterated by n. The probabilities p_ln can be generated using softmax or sigmoid in your network output.

在您的实施中,批次中的损失总和.这将产生非常大的损耗值,并且您的网络梯度将爆炸.相反,您需要使用平均值.请注意,权重是确保您解决班级不平衡问题所必需的.

In your implementation, the loss is summed across the batch. That would produce a very large loss value and your network gradients would explode. Instead, you need to use the average. Note that the weights are required to ensure you combat the class imbalance problem.

除了论文中提到的一个非常具体的例子外,没有具体的证据表明GDL优于交叉熵.GDL之所以具有吸引力是因为它与IoU直接相关,因此,损失函数和评估指标可以携手并进.如果您仍然无法训练您的网络,建议您永久使用交叉熵.

There is no concrete proof that GDL outperforms cross-entropy, save in a very specific example noted in the paper. GDL is attractive because it is directly related to IoU, ergo the loss function and evaluation metrics would improve hand-in-hand. If you still haven't managed to train your network, I'd recommend moving to cross-entropy for good.

这篇关于TensorFlow:我的(通用)骰子丢失实现有什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆