处理Keras的ImageDataGenerator.flow_from_directory中的无效/损坏的图像文件 [英] Handle invalid/corrupted image files in ImageDataGenerator.flow_from_directory in Keras

查看:250
本文介绍了处理Keras的ImageDataGenerator.flow_from_directory中的无效/损坏的图像文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将Python与Keras结合使用,并运行ImageDataGeneratorflow_from_directory.我有一些有问题的图像文件,因此可以使用数据生成器来处理读取错误吗?

I am using Python with Keras and running ImageDataGenerator and using flow_from_directory. I have some problematic image files, so can I use the data generator in order to handle the read errors?

我在图像的一小部分上得到了一些无效的jpg文件",并且希望在不导致代码崩溃的情况下进行处理.

I am getting some "not valid jpg file" on a small portion of the images and would like to treat this without my code crashing.

推荐答案

好吧,一种解决方案是修改ImageDataGenerator代码并将错误处理机制(即try/except)放入其中.

Well, one solution is to modify the ImageDataGenerator code and put error handling mechanism (i.e. try/except) in it.

但是,一种替代方法是将您的生成器包装在另一个生成器中,然后在其中使用try/except.此解决方案的缺点是,即使该批次中的一个图像损坏,它也会丢弃整个生成的批次(这可能意味着某些样本可能根本不用于训练):

However, one alternative is to wrap your generator inside another generator and use try/except there. The disadvantage of this solution is that it throws away the whole generated batch even if one single image is corrupted in that batch (this may mean that it is possible that some of the samples may not be used for training at all):

data_gen = ImageDataGenerator(...)

train_gen = data_gen.flow_from_directory(...)

def my_gen(gen):
    while True:
        try:
            data, labels = next(gen)
            yield data, labels
        except:
            pass

# ... define your model and compile it

# fit the model
model.fit_generator(my_gen(train_gen), ...)

此解决方案的另一个缺点是,由于您需要指定生成器的步数(即steps_per_epoch),并考虑到可能会在一个步骤中丢掉一个批处理,而在同一步骤中会取回一个新的批处理,您可能会在某个时期不只一次对某些样本进行训练.这可能会或可能不会产生明显的影响,具体取决于有多少批次包含损坏的图像(即,如果有少量,则不必担心那么多).

Another disadvantage of this solution is that since you need to specify the number of steps of generator (i.e. steps_per_epoch) and considering that a batch may be thrown away in a step and a new batch is fetched instead in the same step, you may end up training on some of the samples more than once in an epoch. This may or may not have significant effects depending on how many batches include corrupted images (i.e. if there are a few, then there is nothing to be worried about that much).

最后,请注意,您可能要使用更新的Keras数据生成器,即 Sequence 类以在每个批处理中的__getitem__方法中逐一读取图像并丢弃损坏的图像.但是,由于您还需要实现__len__方法,并且实际上等同于steps_per_epoch参数,因此该方法中仍然存在前一种方法的问题,即对某些图像进行多次训练.尽管在我看来,这种方法(例如,将Sequence类子类化)优于上述方法(当然,如果您不考虑可能需要编写更多代码的事实)并且副作用较少(因为您可以丢弃单个图像,而不是整个批次.

Finally, note that you may want to use the newer Keras data-generator i.e. Sequence class to read images one by one in the __getitem__ method in each batch and discard corrupted ones. However, the problem of the previous approach, i.e. training on some of the images more than once, is still present in this approach as well since you also need to implement the __len__ method and it is essentially equivalent to steps_per_epoch argument. Although, in my opinion, this approach (i.e. subclassing Sequence class) is superior to the above approach (of course, if you put aside the fact that you may need to write more code) and have fewer side effects (since you can discard a single image and not the whole batch).

这篇关于处理Keras的ImageDataGenerator.flow_from_directory中的无效/损坏的图像文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆