如何使用预加载编写caffe python数据层? [英] How to write a caffe python data layer with preload?
问题描述
如何在执行其他处理的同时编写异步数据层来预加载批处理?有没有一些示例代码?谢谢
How to write an asyncronous data layer to preload batches while other processing is performed? Are there some example codes? Thanks
推荐答案
有几种方法可以实现您想要的.我会在这里尝试勾画一个选项.
There are several ways you can achieve what you want. I'll try and sketch one option here.
系统的整体视图是:你有n
Loader
s 异步加载数据和馈送队列.然后该层从队列中读取 batch_size
项并在 forward()
函数中馈入网络.
The overall view of the system is: you have n
Loader
s asynchronously loading data and feeding a queue. The layer then reads batch_size
items from the queue and feed the net in the forward()
function.
import caffe, multiprocessing
class Loader(multiprocessing.Process):
def __init__(self, outq, *args, **kwargs):
super(Loader, self).__init__()
self.daemon = True
self.outq = outq
self.start() # start working
def run(self):
while True: # read and never stop at all!
try:
# do your magic here
# assuming you load x,y pairs
self.outq.put((x[None, ...], y[None, ...])) # add singleton "batch" dimension
except Exception as e:
# handle errors?
pass
class MultiProcessInputLayer(caffe.Layer):
def setup(self, bottom, top):
# verify no bottoms, right number of tops etc.
self.dataQ = multiprocessing.Queue()
for _ in xrange(n):
Loader(self.dataQ) # start n Loaders
# some other stuff here...
def reshape(self, bottom, top):
# reshape the inputs to the right sizes
def forward(self, bottom, top):
for i in xrange(batch_size):
item = self.dataQ.get()
top[0].data[i, ...] = item[0]
top[1].data[i, ...] = item[1]
def backward(self, top, propagate_down, bottom):
pass # no backward for data layer
我通过艰难的方式学到的一些技巧和窍门:
1. 使用 multiprocessing
而不是 threading
包,因为 GIL.
2. 有时(例如,如果 batch_size
非常大)forward()
需要很长时间才能从队列中逐项读取以形成每个批次.在这种情况下,您可以添加另一个 multiprocessing.Process
,它将异步读取 self.dataQ
中的 batch_size
项并将整批写入 self.batchQ
.然后 forward()
将在每次调用时只等待来自 self.batchQ
的 单个 项目.
3.注意不要复制过多的数据.处理大图像/标签会使所有这些复制成为瓶颈.
Some tips and tricks I learned the hard way:
1. Use multiprocessing
and not threading
package because of the GIL.
2. Sometimes (e.g. if batch_size
is very large) it will take very long for forward()
to read item by item from the Queue to form each batch. In that case, you might add another multiprocessing.Process
that will async read batch_size
items from self.dataQ
and write whole batches to self.batchQ
. Then forward()
will only wait for a single item from self.batchQ
at each call.
3. Take care not to copy the data around too much. Working with large images/labels can make all these copying into a bottleneck.
这篇关于如何使用预加载编写caffe python数据层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!