如何实现像素级分类的场景标签在TensorFlow？ [英] How to implement pixel-wise classification for scene labeling in TensorFlow?

查看：1862 发布时间：2016/12/26 10:56:56 computer-vision classification tensorflow scene labeling

本文介绍了如何实现像素级分类的场景标签在TensorFlow？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Google的TensorFlow 开发深度学习模型。该模型应用于细分和标记场景。

我使用 SiftFlow数据集，其中包含 $ sem
类 >和 256x256像素的图片。

因此，在我使用卷积和反卷积的最后一层，我到达以下张数（array） [256,256,33]。

接下来，我想要
应用 Softmax ，并将结果与大小为
的语义标签比较[256，256 ]

问题：
如果应用平均值或argmax我的最后一层，所以其形状变成 [256,256,1] ，然后循环遍历每个像素，并分类，如果我在分类 256x256 实例？

解决方案

要应用softmax并使用 cross熵损失，您必须保持完整网络的大小为 batch_size x 256 x 256 x 33 的最终输出。因此，不能使用平均值或argmax，因为它会破坏网络的输出概率。

您必须循环遍历所有 > batch_size x 256 x 256 像素，并对此像素的预测应用交叉熵损失。这是很容易的内置函数 tf.nn.sparse_softmax_cross_entropy_with_logits（logits，labels）。

在应用以下代码之前从文档

警告：此操作需要未定标的logits ，因为它对内部执行softmax以提高效率。不要使用softmax的输出调用此op，因为它会产生不正确的结果。

logits并且必须具有形状[batch_size，num_classes]和dtype（float32或float64）。

标签必须具有形状[batch_size ]和dtype int64。
c>作为函数所需的批量大小。我们将把 logits 和标签重新格式化为此格式。
这里是我使用的代码：
inputs = tf.placeholder（tf。 float 32，[batch_size，256，256，3]）＃输入图像 logits = inference（inputs）＃你的形状输出[batch_size，256，256，33] labels = tf.placeholder（tf.float32，[batch_size，256，256]）＃你的形状标签[batch_size，256，256]和类型int64 reshaped_logits = tf.reshape（logits， -1，33]）＃shape [batch_size * 256 * 256，33] reshaped_labels = tf.reshape（labels，[-1]）＃shape [batch_size * 256 * 256] loss = sparse_softmax_cross_entropy_with_logits （reshaped_logits，reshaped_labels）
然后，您可以将优化程序应用于该损失。

更新：v0.10

文档 tf.sparse_softmax_cross_entropy_with_logits 显示它现在接受 logits 的任何形状，所以没有必要重塑张量（感谢@chillinger）：
inputs = tf.placeholder（tf.float32，[batch_size，256，256，3]）＃input images logits = inference输入）＃你的形状输出[batch_size，256，256，33]（没有最终的softmax !!） labels = tf.placeholder（tf.float32，[batch_size，256，256] [batch_size，256，256] and type int64 loss = sparse_softmax_cross_entropy_with_logits（logits，labels）

I am working on a deep learning model using Google's TensorFlow. The model should be used to segment and label scenes.

I am using the SiftFlow dataset which has 33 semantic classes and images with 256x256 pixels.

As a result, at my final layer using convolution and deconvolution I arrive at the following tensor(array) [256, 256, 33].

Next I would like to apply Softmax and compare the results to a semantic label of size [256, 256].

Questions: Should I apply mean averaging or argmax to my final layer so its shape becomes [256,256,1] and then loop through each pixel and classify as if I were classying 256x256 instances? If the answer is yes, how, if not, what other options?
解决方案
To apply softmax and use a cross entropy loss, you have to keep intact the final output of your network of size batch_size x 256 x 256 x 33. Therefore you cannot use mean averaging or argmax because it would destroy the output probabilities of your network.

You have to loop through all the batch_size x 256 x 256 pixels and apply a cross entropy loss to your prediction for this pixel. This is easy with the built-in function tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels).

Some warnings from the doc before applying the code below:

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.

logits and must have the shape [batch_size, num_classes] and the dtype (either float32 or float64).

labels must have the shape [batch_size] and the dtype int64.

The trick is to use batch_size * 256 * 256 as the batch size required by the function. We will reshape logits and labels to this format. Here is the code I use:
inputs = tf.placeholder(tf.float32, [batch_size, 256, 256, 3]) # input images logits = inference(inputs) # your outputs of shape [batch_size, 256, 256, 33] (no final softmax !!) labels = tf.placeholder(tf.float32, [batch_size, 256, 256]) # your labels of shape [batch_size, 256, 256] and type int64 reshaped_logits = tf.reshape(logits, [-1, 33]) # shape [batch_size*256*256, 33] reshaped_labels = tf.reshape(labels, [-1]) # shape [batch_size*256*256] loss = sparse_softmax_cross_entropy_with_logits(reshaped_logits, reshaped_labels)
You can then apply your optimizer on that loss.

Update: v0.10

The documentation of tf.sparse_softmax_cross_entropy_with_logits shows that it now accepts any shape for logits, so there is no need to reshape the tensors (thanks @chillinger):
inputs = tf.placeholder(tf.float32, [batch_size, 256, 256, 3]) # input images logits = inference(inputs) # your outputs of shape [batch_size, 256, 256, 33] (no final softmax !!) labels = tf.placeholder(tf.float32, [batch_size, 256, 256]) # your labels of shape [batch_size, 256, 256] and type int64 loss = sparse_softmax_cross_entropy_with_logits(logits, labels)

这篇关于如何实现像素级分类的场景标签在TensorFlow？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何实现像素级分类的场景标签在TensorFlow？ [英] How to implement pixel-wise classification for scene labeling in TensorFlow?

问题描述

更新：v0.10

Update: v0.10

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何实现像素级分类的场景标签在TensorFlow？ [英] How to implement pixel-wise classification for scene labeling in TensorFlow?

问题描述

更新：v0.10

Update: v0.10

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭