从CSV加载图像和注释,并将fit_generator与多输出模型一起使用 [英] Load images and annotations from CSV and use fit_generator with multi-output models

查看:252
本文介绍了从CSV加载图像和注释,并将fit_generator与多输出模型一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

issue#10120 之后,我正在使用Keras功能API进行构建一个具有多个(五个)输出和相同输入的模型,以便同时预测数据的不同属性(在我的情况下为图像).数据集的所有元数据都存储在不同的CSV文件中(一个用于训练,一个用于验证,一个用于测试数据).

Following issue #10120, I am using the Keras functional API to build a model with multiple (five) outputs and the same input, in order to simultaneously predict different properties of the data (images in my case). All the metadata of the dataset are stored in different CSV files (one for training, one for validation and one for test data).

我已经写了代码来解析CSV并将所有不同的注释保存到不同的numpy数组(x_train.npy,emotions.npy等)中,稍后将它们加载以训练CNN.

I have already written code to parse the CSV and save all different annotations into different numpy arrays (x_train.npy, emotions.npy etc.) which later I am loading in order to train my CNN.

首先,保存解析后的注释以便随后加载它们的最有效方法是什么?

最好是从CSV文件中即时读取注释,而不是将其保存为numpy(或任何其他格式)吗?

Is it better to read the annotations on the fly from the CSV file instead of saving them to numpy (or any other format)?

当我加载保存的numpy数组时(以下示例仅包含图像和单个元数据)

When I load the saved numpy arrays (the following example contains only images and a single metadata)

 (x_train, y_train),(x_val, y_val)

然后我做

train_generator = datagen.flow(x_train, y_train, batch_size=32)

最后

history = model.fit_generator(train_generator,
                        epochs=nb_of_epochs,
                        steps_per_epoch= steps_per_epoch,
                        validation_data=val_generator,
                        validation_steps=validation_steps,
                        callbacks=callbacks_list)

我的程序似乎在整个训练过程中消耗了多达20-25GB的RAM(在GPU上完成).如果我添加了多个输出,由于内存泄漏(我的最大RAM是32GB),我的程序崩溃了.

My program seems to consume up to 20-25GB of RAM for the whole duration of the training process (which is done on GPU). In case I add more than one output my program crashes because of that memory leak (max RAM I've got is 32GB).

将解析后的注释与原始图像一起加载的正确方法是什么?

假设上述问题已解决,将ImageDataGenerator用于多个输出的正确方法是什么,如下所示(也在此处讨论)

Let's say the above issue is fixed, what will be a correct approach to make use of ImageDataGenerator for multiple outputs like the following (discussed here as well)

Keras:如何使用fit_generator具有多个不同类型的输出

Xi[0], [Yi1[1], Yi2[1],Yi3[1], Yi4[1],Yi5[1]]

推荐答案

def multi_output_generator(hdf5_file, nb_data, batch_size):
    """ Generates batches of tensor image data in form of ==> x, [y1, y2, y3, y4, y5] for use in a multi-output Keras model.

        # Arguments
            hdf5_file: the hdf5 file which contains the images and the annotations.
            nb_data: total number of samples saved in the array.
            batch_size: size of the batch to generate tensor image data for.

        # Returns
            A five-output generator.
    """

    batches_list = list(range(int(ceil(float(nb_data) / batch_size))))

    while True:

        # loop over batches
        for n, i in enumerate(batches_list):
            i_s = i * batch_size  # index of the first image in this batch
            i_e = min([(i + 1) * batch_size, nb_data])  # index of the last image in this batch

            x = hdf5_file["x_train"][i_s:i_e, ...]

            # read labels
            y1 = hdf5_file["y1"][i_s:i_e]
            y2 = hdf5_file["y2"][i_s:i_e]
            y3 = hdf5_file["y3"][i_s:i_e]
            y4 = hdf5_file["y4"][i_s:i_e]
            y5 = hdf5_file["y5"][i_s:i_e]

        yield x, [y1, y2, y3, y4 ,y5]

这篇关于从CSV加载图像和注释,并将fit_generator与多输出模型一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆