Caffe中的批处理模式 - 没有性能提升 [英] Batch processing mode in Caffe - no performance gains

查看:261
本文介绍了Caffe中的批处理模式 - 没有性能提升的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于此主题,我重新实现了我的图像处理代码,一次发送10张图片(即我现在拥有输入blob的num属性设置为100而不是10)。

Following on this thread I reimplemented my image processing code to send in 10 images at once (i.e. I now have the num property of the input blob set to 100 instead of 10).

但是,处理此批次所需的时间比原来大10倍。这意味着我没有获得任何性能提升。

However, the time required to process this batch is 10 times bigger than originally. Which means that I did not get any performance increase.

这是合理的还是我做错了什么?

Is that reasonable or did I make something wrong?

我在CPU模式下运行Caffe。不幸的是GPU模式对我来说不是一个选择。

I am running Caffe in CPU mode. Unfortunately GPU mode is not an option for me.

推荐答案

更新:当使用多个GPU时,Caffe现在原生支持多个图像的并行处理。虽然基于GPU并行的当前实现来实现它似乎相对简单,但目前在多个CPU上没有类似的并行处理支持。

Update: Caffe now natively supports parallel processing of multiple images when using multiple GPUs. Though it seems relatively simple to implement base on the current implementation of GPU parallelism, at the moment there's no similar support for parallel processing on multiple CPUs.

考虑到主要问题实现并行性是在训练期间需要的同步如果你只想并行处理你的图像(而不是训练模型),那么你可以将同一网络的几个副本加载到内存中(无论是通过多处理的python还是带有c ++的c ++)多线程),并在不同的网络上处理每个图像。这将是简单而有效的,特别是如果您加载网络一次然后只处理大量图像。尽管如此,GPU速度要快得多:)

Considering that the main problem with implementing parallelism is the syncing you need during training If you just want to process your images in parallel (as opposed to training the model), then you could load several copies of the same network to memory (whether through python with multiprocessing or c++ with multi-threading), and process each image on a different network. It would be simple and quite effective, especially if you load the networks once and then just process a large amount of images. Nevertheless, GPUs are much faster :)

Caffe不会并行处理多个图像,这是您获得的唯一节省批处理几个图像是在Caffe的框架之间来回传输图像数据所花费的时间,这在处理GPU时可能很重要。

Caffe doesn't process multiple images in parallel, the only saving you get by batch processing several images is in the time it takes to transfer the image data back and forth between Caffe's framework, which could be significant when dealing with the GPU.

IIRC有几次尝试同时制作Caffe过程图像,但大多数都专注于GPU实现(CUDNN,CUDA Streams等),几乎没有尝试向CPU代码添加并行性(OpenBLAS的多线程模式,或者只是在多个线程上运行)。其中我认为只有CUDNN选项目前是稳定版Caffe的一部分,但显然需要GPU。您可以尝试在Caffe的github页面上查看有关此问题的拉取请求之一,看看它是否适合您,但请注意,它可能会导致您当前版本的兼容性问题。

IIRC there are several attempts to make Caffe process images in parallel, but most focus on the GPU implementation (CUDNN, CUDA Streams etc.), with few attempts to add parallelism to the CPU code (OpenBLAS's multithread mode, or simply running on multiple threads). Of those I believe only the CUDNN option is currently part of the stable version of Caffe, but obviously requires a GPU. You can try to look at one of the pull requests about this matter on Caffe's github page and see if it works for you, but note that it might cause compatibilities issue with your current version.

这是我过去使用过的一个版本,虽然不再维护: https://github.com/BVLC/caffe/pull/439

This is one such version that in the past I've used, though it's no longer maintained: https://github.com/BVLC/caffe/pull/439

我在上一期的上一条评论中也注意到了虽然我自己从未尝试过,但有些速度可以加速到这个拉取请求的CPU代码: https://github.com/BVLC/caffe/pull/2610

I've also noticed in the last comment of the above issue that there's some speed up to the CPU code on this pull request as well, though I've never tried it myself: https://github.com/BVLC/caffe/pull/2610

这篇关于Caffe中的批处理模式 - 没有性能提升的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆