Tensorflow U-Net 分割掩码输入 [英] Tensorflow U-Net segmentation mask input

查看:30
本文介绍了Tensorflow U-Net 分割掩码输入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是张量流和语义分割的新手.

I am new to tensorflow and Semantic segmentation.

我正在设计一个用于语义分割的 U-Net.每张图像都有一个我想要分类的对象.但我总共有 10 个不同物体的图像.我很困惑,如何准备我的掩码输入?是多标签分割还是只针对一类?

I am designing a U-Net for semantic segmentaion. Each image has one object that I want to classify. But in total I have images of 10 different objects. I am confused, how can I prepare my mask input? Is it considered as multi-label segmentation or only for one class?

我应该将我的输入转换为一种热编码吗?我应该使用 to_categorical 吗?我找到了多类分割的例子,但我不知道,如果是这种情况.因为在一张图像中,我只有一个对象要检测/分类.

Should I convert my input to one hot encoded? Should I use to_categorical? I find exaples for multi-class segmentation, but I don't know, If that's the case here. Because in one image I only have one object to detect/classify.

我尝试使用它作为我的输入代码.但我不确定,我所做的是否正确.

I tried using this as my code for input. But I am not sure, what I am doing is right or not.

#Generation of batches of image and mask
class DataGen(keras.utils.Sequence):
    def __init__(self, image_names, path, batch_size, image_size=128):
        self.image_names = image_names
        self.path = path
        self.batch_size = batch_size
        self.image_size = image_size

    def __load__(self, image_name):
        # Path
        image_path = os.path.join(self.path, "images/aug_test", image_name) + ".png"
        mask_path = os.path.join(self.path, "masks/aug_test",image_name) +  ".png"

        # Reading Image
        image = cv2.imread(image_path, 1)
        image = cv2.resize(image, (self.image_size, self.image_size))


        # Reading Mask
        mask = cv2.imread(mask_path, -1)
        mask = cv2.resize(mask, (self.image_size, self.image_size))

        ## Normalizaing 
        image = image/255.0
        mask = mask/255.0

        return image, mask

    def __getitem__(self, index):
        if(index+1)*self.batch_size > len(self.image_names):
            self.batch_size = len(self.image_names) - index*self.batch_size

        image_batch = self.image_names[index*self.batch_size : (index+1)*self.batch_size]

        image = []
        mask  = []

        for image_name in image_batch:
            _img, _mask = self.__load__(image_name)
            image.append(_img)
            mask.append(_mask)

        #This is where I am defining my input
        image = np.array(image)
        mask  = np.array(mask)
        mask = tf.keras.utils.to_categorical(mask, num_classes=10, dtype='float32') #Is this true?


        return image, mask

    def __len__(self):
        return int(np.ceil(len(self.image_names)/float(self.batch_size)))

这是真的吗?如果是这样,那么为了将标签/类作为输出,我应该在输入中更改什么?我应该根据我的班级更改我的蒙版像素值吗?

Is this true? If it is, then, to get the label/class as output what should I change in my input? Should I change the value of pixel of my mask according to my class?

这是我的 U-Net 架构.

Here is my U-Net architecture.

# Convolution and deconvolution Blocks

def down_scaling_block(x, filters, kernel_size=(3, 3), padding="same", strides=1):
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
    pool = keras.layers.MaxPool2D((2, 2), (2, 2))(conv)
    return conv, pool

def up_scaling_block(x, skip, filters, kernel_size=(3, 3), padding="same", strides=1):
    conv_t = keras.layers.UpSampling2D((2, 2))(x)
    concat = keras.layers.Concatenate()([conv_t, skip])
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(concat)
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
    return conv

def bottleneck(x, filters, kernel_size=(3, 3), padding="same", strides=1):
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(x)
    conv = keras.layers.Conv2D(filters, kernel_size, padding=padding, strides=strides, activation="relu")(conv)
    return conv

def UNet():
    filters = [16, 32, 64, 128, 256]
    inputs = keras.layers.Input((image_size, image_size, 3))

    '''inputs2 = keras.layers.Input((image_size, image_size, 1))
       conv1_2, pool1_2 = down_scaling_block(inputs2, filters[0])'''

    Input = inputs
    conv1, pool1 = down_scaling_block(Input, filters[0])
    conv2, pool2 = down_scaling_block(pool1, filters[1])
    conv3, pool3 = down_scaling_block(pool2, filters[2])
    '''conv3 = keras.layers.Conv2D(filters[2], kernel_size=(3,3), padding="same", strides=1, activation="relu")(pool2)
    conv3 = keras.layers.Conv2D(filters[2], kernel_size=(3,3), padding="same", strides=1, activation="relu")(conv3)
    drop3 = keras.layers.Dropout(0.5)(conv3)
    pool3 = keras.layers.MaxPooling2D((2,2), (2,2))(drop3)'''
    conv4, pool4 = down_scaling_block(pool3, filters[3])

    bn = bottleneck(pool4, filters[4])

    deConv1 = up_scaling_block(bn, conv4, filters[3]) #8 -> 16
    deConv2 = up_scaling_block(deConv1, conv3, filters[2]) #16 -> 32
    deConv3 = up_scaling_block(deConv2, conv2, filters[1]) #32 -> 64
    deConv4 = up_scaling_block(deConv3, conv1, filters[0]) #64 -> 128

    outputs = keras.layers.Conv2D(10, (1, 1), padding="same", activation="softmax")(deConv4)
    model = keras.models.Model(inputs, outputs)
    return model

model = UNet()
model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["acc"])

train_gen = DataGen(train_img, train_path, image_size=image_size, batch_size=batch_size)
valid_gen = DataGen(valid_img, train_path, image_size=image_size, batch_size=batch_size)
test_gen = DataGen(test_img, test_path, image_size=image_size, batch_size=batch_size)

train_steps = len(train_img)//batch_size
valid_steps = len(valid_img)//batch_size

model.fit_generator(train_gen, validation_data=valid_gen, steps_per_epoch=train_steps, validation_steps=valid_steps, 
                    epochs=epochs)

我希望我正确解释了我的问题.任何帮助表示感谢!

I hope that I explained my question properly. Any help appriciated!

更新:我根据对象类更改了掩码中每个像素的值.(如果图像包含我想归类为 2 号对象的对象,那么我将掩码像素的值更改为 2.整个掩码数组将包含 0(bg)和 2(对象).因此,对于每个对象,掩码将包含 0 和 3、0 和 10 等)

UPDATE: I changed the value of each pixel in mask as per object class. (If the image contains object which I want to classify as object no. 2, then I changed the value of mask pixel to 2. the whole array of mask will contain 0(bg) and 2(object). Accordingly for each object, the mask will contain 0 and 3, 0 and 10 etc.)

这里我先把mask改成binary,然后如果pixel的值大于1就改成1或者2或者3.(根据object/class no.)

Here I first changed the mask to binary and then if the value of pixel is greater than 1, I changed it to 1 or 2 or 3. (according to object/class no.)

然后我使用 to_categorical 将它们转换为 one_hot,如我的代码所示.训练运行,但网络没有学到任何东西.准确度和损失在两个值之间不断摆动.我在这里有什么错误?我是在生成掩码(更改像素值?)还是在 to_categorical 函数上犯了错误?

Then I converted them to one_hot with to_categorical as shown in my code. training runs but the network doesnt learn anything. Accuracy and loss keep swinging between two values. What is my mistake here? Am I making a mistake at generating mask (changing the value of pixels?) Or at the function to_categorical?

发现问题:我在创建蒙版时犯了一个错误..我正在用 cv2 读取图像,它将图像读取为 heightxwidth.. 我正在根据类创建具有像素值的蒙版,在考虑我的图像尺寸为 widthxheight 之后..这导致了问题并使得网络不学习任何东西..它现在工作..

PROBLEM FOUND: I was making an error while creating mask.. I was reading image with cv2, which reads image as heightxwidth.. I was creating mask with pixel values according to class, after considering my image dimention as widthxheight.. Which was causing problem and making network not to learn anything.. It is working now..

推荐答案

每张图像都有一个我想要分类的对象.但我总共有 10 个不同物体的图像.我很困惑,如何准备我的掩码输入?是多标签分割还是只针对一类?

Each image has one object that I want to classify. But in total I have images of 10 different objects. I am confused, how can I prepare my mask input? Is it considered as multi-label segmentation or only for one class?

如果您的数据集有 N 个不同的标签(即:0 - 背景、1 - 狗、2 - 猫...),即使您的图像仅包含一种对象,您也会遇到多类问题.

If your dataset has N different labels (i.e: 0 - background, 1 - dogs, 2 -cats...), you have a multi class problem, even if your images contain only kind of object.

我应该将我的输入转换为一种热编码吗?我应该使用 to_categorical 吗?

Should I convert my input to one hot encoded? Should I use to_categorical?

是的,您应该对标签进行一次性编码.使用 to_categorical 归结为标签的源格式.假设你有 N 个类,你的标签是 (height, width, 1),其中每个像素都有一个范围 [0,N) 的值.在这种情况下,keras.utils.to_categorical(label, N) 将提供一个浮点数 (height,width,N) 标签,其中每个像素为 0 或 1.而且您不必除以255.

Yes, you should one-hot encode your labels. Using to_categorical boils down to the source format of your labels. Say you have N classes and your labels are (height, width, 1), where each pixel has a value in range [0,N). In that case keras.utils.to_categorical(label, N) will provide a float (height,width,N) label, where each pixel is 0 or 1. And you don't have to divide by 255.

如果您的源格式不同,您可能需要使用自定义函数来获得相同的输出格式.

if your source format is different, you may have to use a custom function to get the same output format.

查看此存储库(不是我的作品):keras-unet.notebooks 文件夹包含两个示例,用于在小型数据集上训练 u-net.它们不是多类的,但是很容易一步一步地使用你自己的数据集.将您的标签加载为:

Check out this repo (not my work): keras-unet. The notebooks folder contain two examples to train a u-net on small datasets. They are not multiclass, but it is easy to go step by step to use your own dataset. Star by loading your labels as:

im = Image.open(mask).resize((512,512))
im = to_categorical(im,NCLASSES)

像这样重塑和规范化:

x = np.asarray(imgs_np, dtype=np.float32)/255
y = np.asarray(masks_np, dtype=np.float32)
y = y.reshape(y.shape[0], y.shape[1], y.shape[2], NCLASSES)
x = x.reshape(x.shape[0], x.shape[1], x.shape[2], 3)

使您的模型适应 NCLASSES

adapt your model to NCLASSES

model = custom_unet(
input_shape,
use_batch_norm=False,
num_classes=NCLASSES,
filters=64,
dropout=0.2,
output_activation='softmax')

选择正确的损失:

from keras.losses import categorical_crossentropy
model.compile(    
   optimizer=SGD(lr=0.01, momentum=0.99),
   loss='categorical_crossentropy',    
   metrics=[iou, iou_thresholded])

希望能帮到你

这篇关于Tensorflow U-Net 分割掩码输入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆