from_logits=True 和 from_logits=False 为 UNet 的 tf.losses.CategoricalCrossentropy 获得不同的训练结果 [英] from_logits=True and from_logits=False get different training result for tf.losses.CategoricalCrossentropy for UNet

查看:24
本文介绍了from_logits=True 和 from_logits=False 为 UNet 的 tf.losses.CategoricalCrossentropy 获得不同的训练结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 unet 进行图像语义分割工作,如果我像这样为最后一层设置 Softmax Activation:

I am doing the image semantic segmentation job with unet, if I set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
conv10 = (Activation('softmax'))(conv9)
model = Model(inputs, conv10)
return model
...

然后使用 loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False)即使只有一张训练图像,训练也不会收敛.

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False) The training will not converge even for only one training image.

但是如果我没有像这样为最后一层设置Softmax Activation:

But if I do not set the Softmax Activation for last layer like this:

...
conv9 = Conv2D(n_classes, (3,3), padding = 'same')(conv9)
model = Model(inputs, conv9)
return model
...

然后使用 loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)训练将收敛一张训练图像.

and then using loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) The training will converge for one training image.

我的groundtruth数据集是这样生成的:

My groundtruth dataset is generated like this:

X = []
Y = []
im = cv2.imread(impath)
X.append(im)
seg_labels = np.zeros((height, width, n_classes))
for spath in segpaths:
    mask = cv2.imread(spath, 0)
    seg_labels[:, :, c] += mask
Y.append(seg_labels.reshape(width*height, n_classes))

为什么?我的用法有问题吗?

Why? Is there something wrong for my usage?

这是我的git实验代码:https://github.com/honeytidy/unet您可以结帐并运行(可以在 cpu 上运行).您可以更改 CategoricalCrossentropy 的 Activation 层和 from_logits,看看我说了什么.

This is my experiment code of git: https://github.com/honeytidy/unet You can checkout and run (can run on cpu). You can change the Activation layer and from_logits of CategoricalCrossentropy and see what i said.

推荐答案

将softmax"激活推入交叉熵损失层可显着简化损失计算并使其在数值上更加稳定.
可能的情况是,在您的示例中,数值问题足以使 from_logits=False 选项的训练过程无效.

Pushing the "softmax" activation into the cross-entropy loss layer significantly simplifies the loss computation and makes it more numerically stable.
It might be the case that in your example the numerical issues are significant enough to render the training process ineffective for the from_logits=False option.

您可以在这篇文章.这个推导说明了在将 softmax 与交叉熵损失相结合时可以避免的数值问题.

You can find a derivation of the cross entropy loss (a special case of "info gain" loss) in this post. This derivation illustrates the numerical issues that are averted when combining softmax with cross entropy loss.

这篇关于from_logits=True 和 from_logits=False 为 UNet 的 tf.losses.CategoricalCrossentropy 获得不同的训练结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆