在 Keras 中为 flow_from_directory 使用多个目录 [英] Use multiple directories for flow_from_directory in Keras

查看:31
本文介绍了在 Keras 中为 flow_from_directory 使用多个目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的场景是我们有多个拥有自己数据的对等点,位于不同的目录中,具有相同的子目录结构.我想使用这些数据训练模型,但是如果我将它们全部复制到一个文件夹中,我将无法跟踪哪些数据来自谁(偶尔也会创建新数据,因此不适合保留复制文件每次)我的数据现在存储如下:

My scenario is that we have multiple peers with their own data, located in different directories, with the same sub-directory structure. I want to train the model using those data, but if I copy all of them to one folder, I can't keep track of which data is from whose (the new data is also created occasionally so it's not suitable to keep copy the files every time) My data is now stored like this:

-user01
-user02
-user03
...

(它们都有相似的子目录结构)

(all of them have similar sub-directory structure)

我已经搜索了解决方案,但我只在 这里此处,它们将多个输入连接成 1 个并行输入,这不是我的情况.

I have searched for solution, but I only found the multi-input case in here and here, which they concatenate multiple input into 1 single parallel input, which is not my case.

我知道 flow_from_directory() 一次只能由 1 个目录提供,那么如何构建一个可以一次由多个目录提供的自定义目录?

I know that the flow_from_directory() can only be fed by 1 directory at a time, so how can I build a custom one that can be fed by multiple directory at a time?

如果我的问题是低质量的,请给出如何改进的建议,我也在 keras 的 github 上搜索过,但没有找到任何我可以适应的.

If my question is low-quality, please give advice on how to improve it, I have searched also on the github of keras but didn't find anything that I can adapt.

谢谢.

推荐答案

经过这么多天希望你已经找到了问题的解决方案,但我会在这里分享另一个想法,以便像我这样的新人以后遇到同样的问题,求帮助.

After so many days I hope you have found the solution to the problem, but I will share another idea here so that new people like me who will face the same problem in the future, get help.

几天前我遇到了这种问题.正如 user3731622 所说,follow_links 将是您问题的解决方案.另外,我认为合并两个数据生成器的想法会奏效.但是,在这种情况下,相应数据生成器的批量大小必须与每个相关目录中的数据范围成比例.

A few days ago I had this kind of problem. follow_links will be a solution to your question, as user3731622 said. Also, I think the idea of ​​merging two data generators will work. However, in that case, the batch sizes of the corresponding data generators have to be determined proportion to the extent of data in each relevant directory.

子生成器的批量大小:

Where,
b = Batch Size Of Any Sub-generator
B = Desired Batch Size Of The Merged Generator
n = Number Of Images In That Directory Of Sub-generator
the sum of n = Total Number Of Images In All Directories

查看下面的代码,这可能会有所帮助:

See the code below, this may help:

from keras.preprocessing.image import ImageDataGenerator
from keras.utils import Sequence
import matplotlib.pyplot as plt
import numpy as np
import os


class MergedGenerators(Sequence):

    def __init__(self, batch_size, generators=[], sub_batch_size=[]):
        self.generators = generators
        self.sub_batch_size = sub_batch_size
        self.batch_size = batch_size

    def __len__(self):
        return int(
            sum([(len(self.generators[idx]) * self.sub_batch_size[idx])
                 for idx in range(len(self.sub_batch_size))]) /
            self.batch_size)

    def __getitem__(self, index):
        """Getting items from the generators and packing them"""

        X_batch = []
        Y_batch = []
        for generator in self.generators:
            if generator.class_mode is None:
                x1 = generator[index % len(generator)]
                X_batch = [*X_batch, *x1]

            else:
                x1, y1 = generator[index % len(generator)]
                X_batch = [*X_batch, *x1]
                Y_batch = [*Y_batch, *y1]

        if self.generators[0].class_mode is None:
            return np.array(X_batch)
        return np.array(X_batch), np.array(Y_batch)


def build_datagenerator(dir1=None, dir2=None, batch_size=32):
    n_images_in_dir1 = sum([len(files) for r, d, files in os.walk(dir1)])
    n_images_in_dir2 = sum([len(files) for r, d, files in os.walk(dir2)])

    # Have to set different batch size for two generators as number of images
    # in those two directories are not same. As we have to equalize the image
    # share in the generators
    generator1_batch_size = int((n_images_in_dir1 * batch_size) /
                                (n_images_in_dir1 + n_images_in_dir2))

    generator2_batch_size = batch_size - generator1_batch_size

    generator1 = ImageDataGenerator(
        rescale=1. / 255,
        shear_range=0.2,
        zoom_range=0.2,
        rotation_range=5.,
        horizontal_flip=True,
    )

    generator2 = ImageDataGenerator(
        rescale=1. / 255,
        zoom_range=0.2,
        horizontal_flip=False,
    )

    # generator2 has different image augmentation attributes than generaor1
    generator1 = generator1.flow_from_directory(
        dir1,
        target_size=(128, 128),
        color_mode='rgb',
        class_mode=None,
        batch_size=generator1_batch_size,
        shuffle=True,
        seed=42,
        interpolation="bicubic",
    )

    generator2 = generator2.flow_from_directory(
        dir2,
        target_size=(128, 128),
        color_mode='rgb',
        class_mode=None,
        batch_size=generator2_batch_size,
        shuffle=True,
        seed=42,
        interpolation="bicubic",
    )

    return MergedGenerators(
        batch_size,
        generators=[generator1, generator2],
        sub_batch_size=[generator1_batch_size, generator2_batch_size])


def test_datagen(batch_size=32):
    datagen = build_datagenerator(dir1="./asdf",
                                  dir2="./asdf2",
                                  batch_size=batch_size)

    print("Datagenerator length (Batch count):", len(datagen))

    for batch_count, image_batch in enumerate(datagen):
        if batch_count == 1:
            break

        print("Images: ", image_batch.shape)

        plt.figure(figsize=(10, 10))
        for i in range(image_batch.shape[0]):
            plt.subplot(1, batch_size, i + 1)
            plt.imshow(image_batch[i], interpolation='nearest')
            plt.axis('off')
            plt.tight_layout()


test_datagen(4)

这篇关于在 Keras 中为 flow_from_directory 使用多个目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆