如何使用预加载编写caffe python数据层? [英] How to write a caffe python data layer with preload?

查看:37
本文介绍了如何使用预加载编写caffe python数据层?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在执行其他处理的同时编写异步数据层来预加载批处理?有没有一些示例代码?谢谢

How to write an asyncronous data layer to preload batches while other processing is performed? Are there some example codes? Thanks

推荐答案

有几种方法可以实现您想要的.我会在这里尝试勾画一个选项.

There are several ways you can achieve what you want. I'll try and sketch one option here.

系统的整体视图是:你有n Loaders 异步加载数据和馈送队列.然后该层从队列中读取 batch_size 项并在 forward() 函数中馈入网络.

The overall view of the system is: you have n Loaders asynchronously loading data and feeding a queue. The layer then reads batch_size items from the queue and feed the net in the forward() function.

import caffe, multiprocessing

class Loader(multiprocessing.Process):
  def __init__(self, outq, *args, **kwargs):
    super(Loader, self).__init__()
    self.daemon = True
    self.outq = outq
    self.start()  # start working

  def run(self):
    while True:  # read and never stop at all!
      try:
        # do your magic here
        # assuming you load x,y pairs
        self.outq.put((x[None, ...], y[None, ...]))  # add singleton "batch" dimension
      except Exception as e:
        # handle errors?
        pass

 class MultiProcessInputLayer(caffe.Layer):
   def setup(self, bottom, top):
     # verify no bottoms, right number of tops etc.
     self.dataQ = multiprocessing.Queue()
     for _ in xrange(n):
       Loader(self.dataQ)  # start n Loaders
     # some other stuff here...

   def reshape(self, bottom, top):
     # reshape the inputs to the right sizes

   def forward(self, bottom, top):
     for i in xrange(batch_size):
       item = self.dataQ.get()
       top[0].data[i, ...] = item[0]
       top[1].data[i, ...] = item[1]

   def backward(self, top, propagate_down, bottom):
     pass  # no backward for data layer

我通过艰难的方式学到的一些技巧和窍门:
1. 使用 multiprocessing 而不是 threading 包,因为 GIL.
2. 有时(例如,如果 batch_size 非常大)forward() 需要很长时间才能从队列中逐项读取以形成每个批次.在这种情况下,您可以添加另一个 multiprocessing.Process,它将异步读取 self.dataQ 中的 batch_size 项并将整批写入 self.batchQ.然后 forward() 将在每次调用时只等待来自 self.batchQ单个 项目.
3.注意不要复制过多的数据.处理大图像/标签会使所有这些复制成为瓶颈.

Some tips and tricks I learned the hard way:
1. Use multiprocessing and not threading package because of the GIL.
2. Sometimes (e.g. if batch_size is very large) it will take very long for forward() to read item by item from the Queue to form each batch. In that case, you might add another multiprocessing.Process that will async read batch_size items from self.dataQ and write whole batches to self.batchQ. Then forward() will only wait for a single item from self.batchQ at each call.
3. Take care not to copy the data around too much. Working with large images/labels can make all these copying into a bottleneck.

这篇关于如何使用预加载编写caffe python数据层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆