在Keras中加入两个DirectoryIterator [英] Joining two DirectoryIterators in Keras
问题描述
假设我有以下内容:
image_data_generator = ImageDataGenerator(rescale=1./255)
train_generator = image_data_generator.flow_from_directory(
'my_directory',
target_size=(28, 28),
batch_size=32,
class_mode='categorical'
)
然后,我的train_generator
充满了my_directory
中的数据,该数据包含两个子文件夹,这些子文件夹将数据分为类0
和1
.
Then my train_generator
is filled with data from my_directory
, which contains two subfolders which separate the data into classes 0
and 1
.
假设我还有另一个目录that_directory
,也将数据分为类0
和1
.我想用这些额外的数据来扩充我的train_generator
.
Suppose also I have another directory that_directory
, also with data split into classes 0
and 1
. I want to augment my train_generator
with this additional data.
运行train_generator = image_data_generator.flow_from_directory('that_directory', ...)
会从my_directory
中删除先前的数据.
Running train_generator = image_data_generator.flow_from_directory('that_directory', ...)
removes the prior data from my_directory
.
是否可以在不更改文件夹结构本身的情况下,将两组数据扩充或追加到一个生成器或类似于DirectoryIterator
的对象中?
Is there a way to augment or append both sets of data into one generator or an object that operates like a DirectoryIterator
without changing the folder structure itself?
推荐答案
只需将生成器组合到另一个生成器中,可以选择使用不同的扩充配置:
Just combine the generators in another generator, optionally with different augmentation configs:
idg1 = ImageDataGenerator(**idg1_configs)
idg2 = ImageDataGenerator(**idg2_configs)
g1 = idg1.flow_from_directory('idg1_dir',...)
g2 = idg2.flow_from_directory('idg2_dir',...)
def combine_gen(*gens):
while True:
for g in gens:
yield next(g)
# ...
model.fit_generator(combine_gen(g1, g2), steps_per_epoch=len(g1)+len(g2), ...)
这将从g1
和g2
交替生成批次.
This would alternately generate batches from g1
and g2
.
请注意,有人可能建议使用 itertools.chain
,但是您不能在这里使用它,因为ImageDataGenerators
生成器永无休止,并不断生成大量数据.对于传递给fit_generator
方法的生成器,这是预期的.来自 Keras文档:
Note that one might suggest using itertools.chain
, however you can't use that here since ImageDataGenerators
generators are never-ending and ceaselessly generate batches of data. This is expected for the generator you pass to fit_generator
method. From Keras doc:
...预计生成器将无限期循环其数据.当模型看到
steps_per_epoch
个批处理时,一个纪元结束.
...The generator is expected to loop over its data indefinitely. An epoch finishes when
steps_per_epoch
batches have been seen by the model.
如果未设置steps_per_epoch
,则默认为len(generator)
,其中generator
是您传递给fit_generator
方法的生成器. ImageDataGenerator
生成器可以提供其长度,因此您无需手动设置steps_per_epoch
参数.如果您希望将上面的组合生成器与上述方法结合使用,则可以改用以下解决方案:
The steps_per_epoch
if not set would default to len(generator)
where generator
is the generator you pass to fit_generator
method. The ImageDataGenerator
generators can give their length, so you don't need to manually set the steps_per_epoch
argument. If you would like the same thing with combined generators above, you can use this solution instead:
class CombinedGen():
def __init__(self, *gens):
self.gens = gens
def generate(self):
while True:
for g in self.gens:
yield next(g)
def __len__(self):
return sum([len(g) for g in self.gens])
# usage:
cg = CombinedGen(g1, g2)
model.fit_generator(cg.generate(), ...) # no need to set `steps_per_epoch`
如果您有兴趣直接遍历此类的对象(而不是遍历cg.generate()
),还可以向CombinedGen
类添加__next__
和/或__iter__
方法.
You can also add __next__
and/or __iter__
methods to CombinedGen
class if you are interested to directly iterate over the objects of this class (instead of iterating over cg.generate()
).
这篇关于在Keras中加入两个DirectoryIterator的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!