ImageDataLayer 和 LMDB 数据层之间的速度 [英] The speed between ImageDataLayer and LMDB data layer

查看:37
本文介绍了ImageDataLayer 和 LMDB 数据层之间的速度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Caffe 支持 LMDB 数据层和 ImageDataLayer.从某些数据集创建 LMDB 数据库需要一些时间和大量空间.相比之下,ImageDataLayer 只使用一个 txt 文件,非常方便.我的问题是,这两种层之间的速度差异大吗?非常感谢!

Caffe support LMDB data layer and ImageDataLayer. Create LMDB database from some dataset require some time and a lot of space. In contrast, ImageDataLayer only use a txt file which is very convenient. My question is, is there big speed difference between these two kinds of layers? Thank you very much!

推荐答案

LMDB 旨在更快地从给定的键值 获取数据.此外,数据以未压缩的格式存储,这使得机器可以轻松读取数据并将其直接传递给GPU进行处理.

LMDB is designed for faster fetching of data from a given key value. Also the data is stored in uncompressed format, which makes it easy for the machine to just read the data and directly pass them to the GPU for processing.

ImageDataLayer中,我们必须从文本文件中读取图像细节,并使用OpenCV将图像读入内存.图像的这种解压缩在计算上是昂贵的.

In ImageDataLayer, we have to read the image details from the text file, and use OpenCV to read the image to memory. This uncompressing of image is computationally expensive.

但最好的性能可能并不总是针对 LMDB 层,它在很大程度上取决于机器的配置.考虑一个批量大小为 256 的图像和大小为 227x227x3 的图像的示例.还要考虑您使用的是非常好的 GPU 和高端 i8 处理器机器.这里 LMDB 格式的单个图像可能会占用 151KB.整批可能占用 37MB.如果 GPU 每秒能够执行 10 个批次,那么硬盘的读取速度应该为 370MB/s.如果使用普通的SATA或者外置硬盘,由于硬盘的限制,读取这么大块的数据会出现瓶颈.

But the best performance may not always be for the LMDB layer, it depends heavily on the configuration of the machine. Consider an example of 256 image batch size and the images of size 227x227x3. Also consider than you are using a very good GPU and a high end i8 processor machine. Here single image in LMDB format may occupy 151KB. A whole batch may occupy 37MB. If the GPU is able to perform 10 batches a second, the harddisk should have a speed of reading 370MB/s. If you are using a normal SATA or external harddisk, there will be bottlenecks on reading such large chunks of data due to the limits of the hard disk.

如果 caffe 无法以所需的速度获取数据,则瓶颈会更严重地减慢整个训练过程.同时,如果你在读取 256 张图像并使用 OpenCV 的多核版本,数据预取可能比读取 LMDB 更有效.

If caffe could not fetch data in the required speed, the bottleneck slows the whole training process even worse. At the same time, if you were reading 256 images and use multi-core version of OpenCV, the data prefetching may be handled more effectively than reading an LMDB.

如果你已经将LMDB数据存储在SSD上,则不会出现上述情况!

The above case will not occur if you have stored the LMDB data on a SSD though!

这篇关于ImageDataLayer 和 LMDB 数据层之间的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆