如何编写预加载的caffe python数据层? [英] How to write a caffe python data layer with preload?
问题描述
在执行其他处理如何写asyncronous数据层,以预加载批次?是否有一些示例代码?谢谢
How to write an asyncronous data layer to preload batches while other processing is performed? Are there some example codes? Thanks
推荐答案
有几种方法可以达到你想要的.我会试着在这里写生的一个选项.
There are several ways you can achieve what you want. I'll try and sketch one option here.
该系统的整体图是:你必须batch_size
个项目,并通过forward()
函数将网络馈入.
The overall view of the system is: you have n
Loader
s asynchronously loading data and feeding a queue. The layer then reads batch_size
items from the queue and feed the net in the forward()
function.
import caffe, multiprocessing
class Loader(multiprocessing.Process):
def __init__(self, outq, *args, **kwargs):
super(Loader, self).__init__()
self.daemon = True
self.outq = outq
self.start() # start working
def run(self):
while True: # read and never stop at all!
try:
# do your magic here
# assuming you load x,y pairs
self.outq.put((x[None, ...], y[None, ...])) # add singleton "batch" dimension
except Exception as e:
# handle errors?
pass
class MultiProcessInputLayer(caffe.Layer):
def setup(self, bottom, top):
# verify no bottoms, right number of tops etc.
self.dataQ = multiprocessing.Queue()
for _ in xrange(n):
Loader(self.dataQ) # start n Loaders
# some other stuff here...
def reshape(self, bottom, top):
# reshape the inputs to the right sizes
def forward(self, bottom, top):
for i in xrange(batch_size):
item = self.dataQ.get()
top[0].data[i, ...] = item[0]
top[1].data[i, ...] = item[1]
def backward(self, top, propagate_down, bottom):
pass # no backward for data layer
的一些技巧我学到了艰辛的道路:点击
1.使用
Some tips and tricks I learned the hard way:
1. Use multiprocessing
and not threading
package because of the GIL.
2. Sometimes (e.g. if batch_size
is very large) it will take very long for forward()
to read item by item from the Queue to form each batch. In that case, you might add another multiprocessing.Process
that will async read batch_size
items from self.dataQ
and write whole batches to self.batchQ
. Then forward()
will only wait for a single item from self.batchQ
at each call.
3. Take care not to copy the data around too much. Working with large images/labels can make all these copying into a bottleneck.
这篇关于如何编写预加载的caffe python数据层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!