参数"max_q_size"是多少?用于"model.fit_generator"? [英] What is the parameter "max_q_size" used for in "model.fit_generator"?

查看:259
本文介绍了参数"max_q_size"是多少?用于"model.fit_generator"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我构建了一个简单的生成器,生成了一个tuple(inputs, targets),其中只有inputstargets列表中的单个项目.基本上,它是在爬取数据集,一次爬取一个样本项.

I built a simple generator that yields a tuple(inputs, targets) with only single items in the inputs and targets lists. Basically, it is crawling the data set, one sample item at a time.

我将此生成器传递给:

  model.fit_generator(my_generator(),
                      nb_epoch=10,
                      samples_per_epoch=1,
                      max_q_size=1  # defaults to 10
                      )

我明白了:

  • nb_epoch是训练批处理将运行的次数
  • samples_per_epoch是每个时期训练的样本数量
  • nb_epoch is the number of times the training batch will be run
  • samples_per_epoch is the number of samples trained with per epoch

但是max_q_size是什么意思,为什么默认为10?我以为使用生成器的目的是将数据集批处理成合理的块,那么为什么要增加队列呢?

But what is max_q_size for and why would it default to 10? I thought the purpose of using a generator was to batch data sets into reasonable chunks, so why the additional queue?

推荐答案

这简单地定义了内部训练队列的最大大小,该队列用于预缓存"生成器中的样本.在队列生成期间使用

This simply defines the maximum size of the internal training queue which is used to "precache" your samples from generator. It is used during generation of the the queues

def generator_queue(generator, max_q_size=10,
                    wait_time=0.05, nb_worker=1):
    '''Builds a threading queue out of a data generator.
    Used in `fit_generator`, `evaluate_generator`, `predict_generator`.
    '''
    q = queue.Queue()
    _stop = threading.Event()

    def data_generator_task():
        while not _stop.is_set():
            try:
                if q.qsize() < max_q_size:
                    try:
                        generator_output = next(generator)
                    except ValueError:
                        continue
                    q.put(generator_output)
                else:
                    time.sleep(wait_time)
            except Exception:
                _stop.set()
                raise

    generator_threads = [threading.Thread(target=data_generator_task)
                         for _ in range(nb_worker)]

    for thread in generator_threads:
        thread.daemon = True
        thread.start()

    return q, _stop

换句话说,您有一个线程直接从生成器填充队列,直到达到给定的最大容量,而(例如)训练例程消耗其元素(有时等待完成)

In other words you have a thread filling the queue up to given, maximum capacity directly from your generator, while (for example) training routine consumes its elements (and sometimes waits for the completion)

 while samples_seen < samples_per_epoch:
     generator_output = None
     while not _stop.is_set():
         if not data_gen_queue.empty():
             generator_output = data_gen_queue.get()
             break
         else:
             time.sleep(wait_time)

,为什么默认为10?像大多数默认值一样,没有特殊的原因-这很有意义,但是您也可以使用其他值.

and why default of 10? No particular reason, like most of the defaults - it simply makes sense, but you could use different values too.

像这样的构造表明,作者考虑了昂贵的数据生成器,这可能需要花费一些时间来实现.例如,考虑在生成器调用中通过网络下载数据-然后预缓存一些下一批,并出于效率和对网络错误的鲁棒性而并行下载下一批是有意义的.

Construction like this suggests, that authors thought about expensive data generators, which might take time to execture. For example consider downloading data over a network in generator call - then it makes sense to precache some next batches, and download next ones in parallel for the sake of efficiency and to be robust to network errors etc.

这篇关于参数"max_q_size"是多少?用于"model.fit_generator"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆