自定义Keras数据生成器,产量高 [英] Custom Keras Data Generator with yield

查看:245
本文介绍了自定义Keras数据生成器,产量高的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个自定义数据生成器,但不知道如何将yield函数与__getitem__方法内的无限循环结合在一起.

编辑:答案之后,我意识到我使用的代码是Sequence,不需要yield语句.

目前,我正在使用return语句返回多个图像:

 class DataGenerator(tensorflow.keras.utils.Sequence):
    def __init__(self, files, labels, batch_size=32, shuffle=True, random_state=42):
        'Initialization'
        self.files = files
        self.labels = labels
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.random_state = random_state
        self.on_epoch_end()

    def __len__(self):
        return int(np.floor(len(self.files) / self.batch_size))

    def __getitem__(self, index):
        # Generate indexes of the batch
        indexes = self.indexes[index * self.batch_size:(index + 1) * self.batch_size]

        files_batch = [self.files[k] for k in indexes]
        y = [self.labels[k] for k in indexes]

        # Generate data
        x = self.__data_generation(files_batch)

        return x, y

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.indexes = np.arange(len(self.files))
        if self.shuffle == True:
            np.random.seed(self.random_state)
            np.random.shuffle(self.indexes)


    def __data_generation(self, files):
        imgs = []

        for img_file in files:

            img = cv2.imread(img_file, -1)

            ###############
            # Augment image
            ###############

            imgs.append(img) 

        return imgs
 

在此文章中,我看到了yield是在无限循环中使用.我不太了解这种语法.循环如何逃生?

解决方案

您正在使用Sequence API,该API与普通生成器的工作方式略有不同.在生成器函数中,您可以使用yield关键字在while True:循环内执行迭代,因此,每次Keras调用生成器时,它都会获取一批数据,并自动环绕数据的末尾.

但是在序列中,__getitem__函数有一个index参数,因此不需要迭代或yield,这是Keras为您执行的.这样可以使序列可以使用多重处理并行运行,而这对于旧的生成器函数是不可能的.

因此,您以正确的方式行事,无需任何更改.

I am trying to create a custom data generator and don't know how integrate the yield function combined with an infinite loop inside the __getitem__ method.

EDIT: After the answer I realized that the code I am using is a Sequence which doesn't need a yield statement.

Currently I am returning multiple images with a return statement:

class DataGenerator(tensorflow.keras.utils.Sequence):
    def __init__(self, files, labels, batch_size=32, shuffle=True, random_state=42):
        'Initialization'
        self.files = files
        self.labels = labels
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.random_state = random_state
        self.on_epoch_end()

    def __len__(self):
        return int(np.floor(len(self.files) / self.batch_size))

    def __getitem__(self, index):
        # Generate indexes of the batch
        indexes = self.indexes[index * self.batch_size:(index + 1) * self.batch_size]

        files_batch = [self.files[k] for k in indexes]
        y = [self.labels[k] for k in indexes]

        # Generate data
        x = self.__data_generation(files_batch)

        return x, y

    def on_epoch_end(self):
        'Updates indexes after each epoch'
        self.indexes = np.arange(len(self.files))
        if self.shuffle == True:
            np.random.seed(self.random_state)
            np.random.shuffle(self.indexes)


    def __data_generation(self, files):
        imgs = []

        for img_file in files:

            img = cv2.imread(img_file, -1)

            ###############
            # Augment image
            ###############

            imgs.append(img) 

        return imgs

In this article I saw that yield is used in an infinite loop. I don't quite understand that syntax. How is the loop escaped?

解决方案

You are using the Sequence API, which works a bit different than plain generators. In a generator function, you would use the yield keyword to perform iteration inside a while True: loop, so each time Keras calls the generator, it gets a batch of data and it automatically wraps around the end of the data.

But in a Sequence, there is an index parameter to the __getitem__ function, so no iteration or yield is required, this is performed by Keras for you. This is made so the sequence can run in parallel using multiprocessing, which is not possible with old generator functions.

So you are doing things the right way, there is no change needed.

这篇关于自定义Keras数据生成器,产量高的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆