处理 Keras 中 ImageDataGenerator.flow_from_directory 中无效/损坏的图像文件 [英] Handle invalid/corrupted image files in ImageDataGenerator.flow_from_directory in Keras

查看:28
本文介绍了处理 Keras 中 ImageDataGenerator.flow_from_directory 中无效/损坏的图像文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Keras 中使用 Python 并运行 ImageDataGenerator 并使用 flow_from_directory.我有一些有问题的图像文件,所以我可以使用数据生成器来处理读取错误吗?

I am using Python with Keras and running ImageDataGenerator and using flow_from_directory. I have some problematic image files, so can I use the data generator in order to handle the read errors?

我在一小部分图像上收到了一些无效的 jpg 文件",并希望在不使我的代码崩溃的情况下进行处理.

I am getting some "not valid jpg file" on a small portion of the images and would like to treat this without my code crashing.

推荐答案

嗯,一个解决方案就是修改ImageDataGenerator代码,在里面加入错误处理机制(即try/except).

Well, one solution is to modify the ImageDataGenerator code and put error handling mechanism (i.e. try/except) in it.

>

然而,一种替代方法是将您的生成器包装在另一个生成器中并在那里使用 try/except.该解决方案的缺点是,即使该批次中的单个图像损坏,它也会丢弃整个生成的批次(这可能意味着某些样本可能根本不用于训练):

However, one alternative is to wrap your generator inside another generator and use try/except there. The disadvantage of this solution is that it throws away the whole generated batch even if one single image is corrupted in that batch (this may mean that it is possible that some of the samples may not be used for training at all):

data_gen = ImageDataGenerator(...)

train_gen = data_gen.flow_from_directory(...)

def my_gen(gen):
    while True:
        try:
            data, labels = next(gen)
            yield data, labels
        except:
            pass

# ... define your model and compile it

# fit the model
model.fit_generator(my_gen(train_gen), ...)

这个解决方案的另一个缺点是,由于您需要指定生成器的步数(即steps_per_epoch),并考虑到一个批次可能会在一个步骤中被扔掉并获取一个新批次相反,在同一步骤中,您可能会在一个 epoch 中对某些样本进行多次训练.这可能会或可能不会产生显着影响,具体取决于包含损坏图像的批次数量(即,如果有几个批次,则无需担心那么多).

Another disadvantage of this solution is that since you need to specify the number of steps of generator (i.e. steps_per_epoch) and considering that a batch may be thrown away in a step and a new batch is fetched instead in the same step, you may end up training on some of the samples more than once in an epoch. This may or may not have significant effects depending on how many batches include corrupted images (i.e. if there are a few, then there is nothing to be worried about that much).

最后,请注意您可能想要使用较新的 Keras 数据生成器,即 Sequence 类在__getitem__ 方法中逐个读取图像并丢弃损坏的图像.但是,前一种方法的问题,即对某些图像进行多次训练,在这种方法中仍然存在,因为您还需要实现 __len__ 方法,它本质上等同于steps_per_epoch 参数.虽然,在我看来,这种方法(即子类化 Sequence 类)优于上述方法(当然,如果你撇开你可能需要编写更多代码的事实)并且有更少的边效果(因为您可以丢弃单个图像而不是整个批次).

Finally, note that you may want to use the newer Keras data-generator i.e. Sequence class to read images one by one in the __getitem__ method in each batch and discard corrupted ones. However, the problem of the previous approach, i.e. training on some of the images more than once, is still present in this approach as well since you also need to implement the __len__ method and it is essentially equivalent to steps_per_epoch argument. Although, in my opinion, this approach (i.e. subclassing Sequence class) is superior to the above approach (of course, if you put aside the fact that you may need to write more code) and have fewer side effects (since you can discard a single image and not the whole batch).

这篇关于处理 Keras 中 ImageDataGenerator.flow_from_directory 中无效/损坏的图像文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆