如何编写预加载的caffe python数据层? [英] How to write a caffe python data layer with preload?

查看:110
本文介绍了如何编写预加载的caffe python数据层?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在执行其他处理如何写asyncronous数据层,以预加载批次?是否有一些示例代码?谢谢

How to write an asyncronous data layer to preload batches while other processing is performed? Are there some example codes? Thanks

推荐答案

有几种方法可以达到你想要的.我会试着在这里写生的一个选项.

There are several ways you can achieve what you want. I'll try and sketch one option here.

该系统的整体图是:你必须 取值异步数据加载和喂入带队列.然后,该层从队列中读取batch_size个项目,并通过forward()函数将网络馈入.

The overall view of the system is: you have n Loaders asynchronously loading data and feeding a queue. The layer then reads batch_size items from the queue and feed the net in the forward() function.

import caffe, multiprocessing

class Loader(multiprocessing.Process):
  def __init__(self, outq, *args, **kwargs):
    super(Loader, self).__init__()
    self.daemon = True
    self.outq = outq
    self.start()  # start working

  def run(self):
    while True:  # read and never stop at all!
      try:
        # do your magic here
        # assuming you load x,y pairs
        self.outq.put((x[None, ...], y[None, ...]))  # add singleton "batch" dimension
      except Exception as e:
        # handle errors?
        pass

 class MultiProcessInputLayer(caffe.Layer):
   def setup(self, bottom, top):
     # verify no bottoms, right number of tops etc.
     self.dataQ = multiprocessing.Queue()
     for _ in xrange(n):
       Loader(self.dataQ)  # start n Loaders
     # some other stuff here...

   def reshape(self, bottom, top):
     # reshape the inputs to the right sizes

   def forward(self, bottom, top):
     for i in xrange(batch_size):
       item = self.dataQ.get()
       top[0].data[i, ...] = item[0]
       top[1].data[i, ...] = item[1]

   def backward(self, top, propagate_down, bottom):
     pass  # no backward for data layer

的一些技巧我学到了艰辛的道路:点击 1.使用和不因为 GIL .结果的包 2.有时(例如,如果是非常大的),它将需要很长的时间由项目从队列读出项目,以形成每个批次.在这种情况下,你可能会添加其他<8>将异步读从项目<10>和写入整个批次.然后只会等待的的从在各呼叫项目.点击 3.注意不要将数据复制的太多了.用的图像时,/标签可以使所有这些拷贝到一个瓶颈.

Some tips and tricks I learned the hard way:
1. Use multiprocessing and not threading package because of the GIL.
2. Sometimes (e.g. if batch_size is very large) it will take very long for forward() to read item by item from the Queue to form each batch. In that case, you might add another multiprocessing.Process that will async read batch_size items from self.dataQ and write whole batches to self.batchQ. Then forward() will only wait for a single item from self.batchQ at each call.
3. Take care not to copy the data around too much. Working with large images/labels can make all these copying into a bottleneck.

这篇关于如何编写预加载的caffe python数据层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆