如何在Keras中加载图像蒙版(标签)以进行图像分割 [英] How to load Image Masks (Labels) for Image Segmentation in Keras

查看:413
本文介绍了如何在Keras中加载图像蒙版(标签)以进行图像分割的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Tensorflow作为Keras的后端,并且试图了解如何引入标签进行图像分割训练.

我正在使用 LFW零件数据集,它具有两个基本事实图像和地面真相蒙版,看起来像这样* 1500个训练图像:

据我了解,在训练过程中,我同时加载了

  • (X)图片
  • (Y)遮罩图像

分批执行此操作以满足我的需求.现在我的问题是,将它们(图像和蒙版图像)都作为NumPy数组(N,N,3)加载是否足够?还是我需要以某种方式处理/重塑蒙版图像.有效地,遮罩/标签表示为[R,G,B]像素,其中:

  • [255,0,0]头发
  • [0,255,0]脸
  • [0,0,255]背景

我可以执行以下操作将其标准化为0-1,但我不知道是否应该这样做:

im = Image.open(path)
label = np.array(im, dtype=np.uint8)
label = np.multiply(label, 1.0/255)

所以我最终得到了:

  • [1、0、0]头发
  • [0,1,0]脸
  • [0,0,1]背景

我在网上发现的所有内容都使用tensorflow或keras中的现有数据集.如果您拥有可以视为自定义数据集的内容,那么,关于如何完成工作的了解还不是很清楚.

我发现这与Caffe有关: https://groups.google .com/forum/#!topic/caffe-users/9qNggEa8EaQ

他们主张将遮罩图像转换为(H, W, 1)(HWC),其中我的班级分别是Background,Hair和Face的0, 1 ,2.

这可能是此处的重复项(类似的问题/答案的组合):

如何实现多类语义细分?

Tensorflow:如何创建Pascal VOC样式图像

我找到了一个将PascalVOC处理为我适应的(N,N,1)的示例:

 LFW_PARTS_PALETTE = {
    (0, 0, 255) : 0 , # background (blue)
    (255, 0, 0) : 1 , # hair (red)
    (0, 0, 255) : 2 , # face (green)
}

def convert_from_color_segmentation(arr_3d):
    arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)
    palette = LFW_PARTS_PALETTE

    for i in range(0, arr_3d.shape[0]):
        for j in range(0, arr_3d.shape[1]):
            key = (arr_3d[i, j, 0], arr_3d[i, j, 1], arr_3d[i, j, 2])
            arr_2d[i, j] = palette.get(key, 0) # default value if key was not found is 0

    return arr_2d
 

我认为这可能接近我想要的,但并没有发现.我想我必须是(N,N,3),因为我有3个班级?上面的版本还有另外一个源自这两个位置:

https://github.com/martinkersner/train-CRF-RNN/blob/master/utils.py#L50

https://github.com/DrSleep/tensorflow-deeplab-resnet/blob/ce75c97fc1337a676e32214ba74865e55adc362c/deeplab_resnet/utils.py#L41 (此链接是热点值)

解决方案

由于这是语义分割,因此您要对图像中的每个像素进行分类,因此很有可能会使用交叉熵损失. Keras和TensorFlow都要求您的蒙版是经过热编码的,而且蒙版的输出尺寸应类似于[batch,height,width,num_classes]<-,您将必须采用与计算交叉熵蒙版之前,先蒙版,这实际上意味着您必须重整logit并将其蒙版为张量形状[-1,num_classes],其中-1表示根据需要而定".

最后看这里

由于您的问题是关于加载自己的图像的,所以我自己完成了用于分割的输入管道,尽管它在TensorFlow中,所以我不知道它是否对您有帮助,请看一下您是否感兴趣: 用于细分的Tensorflow输入管道

I am using Tensorflow as a backend to Keras and I am trying to understand how to bring in my labels for image segmentation training.

I am using the LFW Parts Dataset which has both the ground truth image and the ground truth mask which looks like this * 1500 training images:

As I understand the process, during training, I load both the

  • (X) Image
  • (Y) Mask Image

Doing this in batches to meet my needs. Now my question is, is it sufficient to just load them both (Image and Mask Image) as NumPy arrays (N, N, 3) or do I need to process/reshape the Mask image in some way. Effectively, the mask/labels are represented as [R, G, B] pixels where:

  • [255, 0, 0] Hair
  • [0, 255, 0] Face
  • [0, 0, 255] Background

I could do something like this to normalize it to 0-1, I don't know if I should though:

im = Image.open(path)
label = np.array(im, dtype=np.uint8)
label = np.multiply(label, 1.0/255)

so I end up with:

  • [1, 0, 0] Hair
  • [0, 1, 0] Face
  • [0, 0, 1] Background

Everything I found online uses existing datasets in tensorflow or keras. Nothing is really all that clear on how to pull things off if you have what could be a considered a custom dataset.

I found this related to Caffe: https://groups.google.com/forum/#!topic/caffe-users/9qNggEa8EaQ

And they advocate for converting the mask images to a (H, W, 1) (HWC) ?where my classes would be 0, 1 ,2 for Background, Hair, and Face respectively.

It may be that this is a duplicate here (combination of similar quesiton/answers):

How to implement multi-class semantic segmentation?

Tensorflow: How to create a Pascal VOC style image

I found one example that processes PascalVOC into (N, N, 1) that I adapted:

LFW_PARTS_PALETTE = {
    (0, 0, 255) : 0 , # background (blue)
    (255, 0, 0) : 1 , # hair (red)
    (0, 0, 255) : 2 , # face (green)
}

def convert_from_color_segmentation(arr_3d):
    arr_2d = np.zeros((arr_3d.shape[0], arr_3d.shape[1]), dtype=np.uint8)
    palette = LFW_PARTS_PALETTE

    for i in range(0, arr_3d.shape[0]):
        for j in range(0, arr_3d.shape[1]):
            key = (arr_3d[i, j, 0], arr_3d[i, j, 1], arr_3d[i, j, 2])
            arr_2d[i, j] = palette.get(key, 0) # default value if key was not found is 0

    return arr_2d

I think this might be close to what I want but not spot on. I think I need it to be (N, N, 3) since I have 3 classes? The above version and there is another one originated from these 2 locations:

https://github.com/martinkersner/train-CRF-RNN/blob/master/utils.py#L50

https://github.com/DrSleep/tensorflow-deeplab-resnet/blob/ce75c97fc1337a676e32214ba74865e55adc362c/deeplab_resnet/utils.py#L41 (this link one-hot's the values)

解决方案

Since this is semantic segmentation, you are classifying each pixel in the image, so you would be using a cross-entropy loss most likely. Keras, as well as TensorFlow require that your mask is one hot encoded, and also, the output dimension of your mask should be something like [batch, height, width, num_classes] <- which you will have to reshape the same way as your mask before computing your cross-entropy mask, which essentially means that you would have to reshape your logits and mask to the tensor shape [-1, num_classes] where -1 denotes 'as many as required'.

Have a look here at the end

Since your question is about loading your own image, I just finished building an input pipeline for segmentation myself, it is in TensorFlow though, so I don't know if it helps you, have a look if you are interested: Tensorflow input pipeline for segmentation

这篇关于如何在Keras中加载图像蒙版(标签)以进行图像分割的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆