如何在keras fit_generator()中定义max_queue_size,worker和use_multiprocessing? [英] How to define max_queue_size, workers and use_multiprocessing in keras fit_generator()?

查看:1195
本文介绍了如何在keras fit_generator()中定义max_queue_size,worker和use_multiprocessing?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用GPU版本的keras在经过预训练的网络上应用转移学习.我不了解如何定义参数 max_queue_size workers use_multiprocessing .如果我更改了这些参数(主要是为了加快学习速度),我不确定是否每个时期仍能看到所有数据.

I am applying transfer-learning on a pre-trained network using the GPU version of keras. I don't understand how to define the parameters max_queue_size, workers, and use_multiprocessing. If I change these parameters (primarily to speed-up learning), I am unsure whether all data is still seen per epoch.

max_queue_size :

max_queue_size:

  • 内部训练队列的最大大小,用于预缓存"生成器中的样本

  • maximum size of the internal training queue which is used to "precache" samples from the generator

问题:这是否表示在CPU上准备了多少批?它与workers有什么关系?如何最佳地定义它?

Question: Does this refer to how many batches are prepared on CPU? How is it related to workers? How to define it optimally?

workers :

workers:

  • 并行生成批处理的线程数.批处理在CPU上并行计算,然后即时传递到GPU进行神经网络计算

  • number of threads generating batches in parallel. Batches are computed in parallel on the CPU and passed on the fly onto the GPU for neural network computations

问题:我如何找出我的CPU可以/应该并行生成多少个批处理?

Question: How do I find out how many batches my CPU can/should generate in parallel?

use_multiprocessing :

use_multiprocessing:

  • 是否使用基于进程的线程

  • whether to use process-based threading

问题:如果更改workers,是否必须将此参数设置为true?它与CPU使用率有关吗?

Question: Do I have to set this parameter to true if I change workers? Does it relate to CPU usage?

相关问题可以在这里找到:

以下示例的详细示例如何在Keras中使用数据生成器.

我正在按以下方式使用fit_generator():

I am using fit_generator() as follows:

    history = model.fit_generator(generator=trainGenerator,
                                  steps_per_epoch=trainGenerator.samples//nBatches,     # total number of steps (batches of samples)
                                  epochs=nEpochs,                   # number of epochs to train the model
                                  verbose=2,                        # verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch
                                  callbacks=callback,               # keras.callbacks.Callback instances to apply during training
                                  validation_data=valGenerator,     # generator or tuple on which to evaluate the loss and any model metrics at the end of each epoch
                                  validation_steps=
                                  valGenerator.samples//nBatches,   # number of steps (batches of samples) to yield from validation_data generator before stopping at the end of every epoch
                                  class_weight=classWeights,                # optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function
                                  max_queue_size=10,                # maximum size for the generator queue
                                  workers=1,                        # maximum number of processes to spin up when using process-based threading
                                  use_multiprocessing=False,        # whether to use process-based threading
                                  shuffle=True,                     # whether to shuffle the order of the batches at the beginning of each epoch
                                  initial_epoch=0)   

我的机器的规格是:

CPU : 2xXeon E5-2260 2.6 GHz
Cores: 10
Graphic card: Titan X, Maxwell, GM200
RAM: 128 GB
HDD: 4TB
SSD: 512 GB

推荐答案

Q_0:

问题:这是否指的是在CPU上准备多少批次?它与工人有什么关系?如何最佳定义?

Question: Does this refer to how many batches are prepared on CPU? How is it related to workers? How to define it optimally?

来自

From the link you posted, you can learn that your CPU keeps creating batches until the queue is at the maximum queue size or reaches the stop. You want to have batches ready for your GPU to "take" so that the GPU doesn't have to wait for the CPU. An ideal value for the queue size would be to make it large enough that your GPU is always running near the maximum and never has to wait for the CPU to prepare new batches.

Q_1:

问题:如何找出我的CPU可以/应该并行生成多少个批次?

Question: How do I find out how many batches my CPU can/should generate in parallel?

如果您发现GPU处于空闲状态并正在等待批处理,请尝试增加工作人员的数量,也许还增加队列的大小.

If you see that your GPU is idling and waiting for batches, try to increase the amount of workers and perhaps also the queue size.

Q_2:

如果我更换工人,是否必须将此参数设置为true?它与CPU使用率有关吗?

Do I have to set this parameter to true if I change workers? Does it relate to CPU usage?

此处是对将其设置为TrueFalse时会发生的情况的实用分析.建议在此处进行设置将其保存到False以防止冻结(在我的设置中,True可以正常运行而不冻结).也许其他人可以增进我们对该主题的理解.

Here is a practical analysis of what happens when you set it to True or False. Here is a recommendation to set it to False to prevent freezing (in my setup True works fine without freezing). Perhaps someone else can increase our understanding of the topic.

尝试不进行顺序设置,请尝试使CPU为GPU提供足够的数据.

Try not to have a sequential setup, try to enable the CPU to provide enough data for the GPU.

还:您可以(应该吗?)下次再提出几个问题,以便更轻松地回答它们.

Also: You could (should?) create several questions the next time, so that it is easier to answer them.

这篇关于如何在keras fit_generator()中定义max_queue_size,worker和use_multiprocessing?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆