Keras的OrderedEnqueuer是否保证批次的顺序? [英] Is the order of batches guaranteed in Keras' OrderedEnqueuer?
问题描述
我有一个自定义keras.utils.sequence
,它可以按特定(和关键)顺序生成批处理.
I have a custom keras.utils.sequence
which generates batches in a specific (and critical) order.
但是,我需要跨多个内核并行生成批处理.名称'OrderedEnqueuer
'是否表示保证结果队列中的批处理顺序与原始keras.utils.sequence
的顺序相同?
However, I need to parellelise batch generation across multiple cores. Does the name 'OrderedEnqueuer
' imply that the order of batches in the resulting queue is guaranteed to be the same as the order of the original keras.utils.sequence
?
我认为无法保证此顺序的原因:
My reasons for thinking that this order is not guaranteed:
- OrderedEnqueuer在内部使用python
multiprocessing
的apply_async
. - Keras的文档明确地说,保证
OrderedEnqueuer
不重复批次-但不能保证订单.
- OrderedEnqueuer uses python
multiprocessing
'sapply_async
internally. - Keras' docs explicitly say that
OrderedEnqueuer
is guaranteed not to duplicate batches - but not that the order is guaranteed.
我认为是这样的原因
- 名字!
- 我知道
keras.utils.sequence
对象是可索引的. - 我在Keras的github上找到了测试脚本,这些脚本似乎旨在验证顺序-尽管我找不到有关这些脚本是否通过或确实结论性的任何文档.
- The name!
- I understand that
keras.utils.sequence
objects are indexable. - I found test scripts on Keras' github which appear to be designed to verify order - although I could not find any documentation about whether these were passed, or whether they are truly conclusive.
如果不能保证这里的顺序,我将欢迎任何有关如何在保持有保证的顺序的同时进行批量批处理的建议,但前提是它必须能够并行执行任意python代码-我相信例如tf.data.Dataset
API可以不允许这样做(tf.py_function
调用返回原始python进程).
If the order here is not guaranteed, I would welcome any suggestions on how to parellelise batch preparation while maintaining a guaranteed order, with the proviso that it must be able to parellelise arbitrary python code - I believe e.g tf.data.Dataset
API does not allow this (tf.py_function
calls back to original python process).
推荐答案
是的,它是有序的.
通过以下测试自行检查.
Check it yourself with the following test.
首先,让我们创建一个虚拟Sequence
,它在等待随机时间(随机时间是为了确保批次不会按顺序完成)之后仅返回批次索引:
First, let's create a dummy Sequence
that returns just the batch index after waiting a random time (the random time is to assure that the batches will not be finished in order):
import time, random, datetime
import numpy as np
import tensorflow as tf
class DataLoader(tf.keras.utils.Sequence):
def __len__(self):
return 10
def __getitem__(self, i):
time.sleep(random.randint(1,2))
#you could add a print here to see that it's out of order
return i
现在,让我们创建一个测试函数,该函数创建并使用入队者. 该函数可以计算工作人员的数量,并打印花费的时间以及返回的结果.
Now let's create a test function that creates the enqueuer and uses it. The function takes the number of workers and prints the time taken as well as the results as returned.
def test(workers):
enq = tf.keras.utils.OrderedEnqueuer(DataLoader())
enq.start(workers = workers)
gen = enq.get()
results = []
start = datetime.datetime.now()
for i in range(30):
results.append(next(gen))
enq.stop()
print('test with', workers, 'workers took', datetime.datetime.now() - start)
print("results:", results)
结果:
test(1)
test(8)
有1个工人的测试参加了0:00:45.093122
结果:[0、1、2、3、4、5、6、7、8、9、0、1、2、3、4、5、6、7、8、9、0、1、2、3 ,4,5,6,7,8,9]
8名工人进行的测试参加了0:00:09.127771
结果:[0、1、2、3、4、5、6、7、8、9、0、1、2、3、4、5、6、7、8、9、0、1、2、3 ,4,5,6,7,8,9]
test with 1 workers took 0:00:45.093122
results: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
test with 8 workers took 0:00:09.127771
results: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
请注意:
- 8个工人要比1个工人快->可以并行化
- 两种情况下的结果排序
这篇关于Keras的OrderedEnqueuer是否保证批次的顺序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!