批处理大小不适用于使用deploy.prototxt的caffe [英] batch size does not work for caffe with deploy.prototxt

查看:157
本文介绍了批处理大小不适用于使用deploy.prototxt的caffe的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试加快分类过程.我想增加我的deploy.prototxt文件中的第一个input_dim,但这似乎不起作用.它甚至比一张一张一张地分类要慢一些.

I'm trying to make my classification process a bit faster. I thought of increasing the first input_dim in my deploy.prototxt but that does not seem to work. It's even a little bit slower than classifying each image one by one.

deploy.prototxt

deploy.prototxt

input: "data"  
input_dim: 128  
input_dim: 1  
input_dim: 120  
input_dim: 160  
... net description ...

python网络初始化

python net initialization

net=caffe.Net( 'deploy.prototxt', 'model.caffemodel', caffe.TEST)
net.blobs['data'].reshape(128, 1, 120, 160)
transformer = caffe.io.Transformer({'data':net.blobs['data'].data.shape})
#transformer settings

python分类

images=[None]*128
for i in range(len(images)):
  images[i]=caffe.io.load_image('image_path', False)
for j in range(len(images)):
  net.blobs['data'].data[j,:,:,:] = transformer.preprocess('data',images[j])
out = net.forward()['prob']

我跳过了一些细节,但是重要的事情应该给出.我尝试了不同的批处理大小,例如32、64,...,1024,但几乎都相同.所以我的问题是,如果有人知道我做错了什么或需要更改什么? 感谢您的帮助!

I skipped some details, but the important stuff should be given. I tried different batch size, like 32, 64, ..., 1024 but all nearly the same. So my question is, if someone has an idea what I'm doing wrong or what needs to be changed? Thanks for help!


一些计时结果,平均时间就是处理后的图像确定的总时间(1044).


Some timing results, the avg-times are just the total-times devided by the processed images(1044).

批量大小:1

2016-05-04 10:51:20,721-检测器-信息-数据形状:(1、1、120、160)
2016-05-04 10:51:35,149-主要-信息-GPU计时:
2016-05-04 10:51:35,149-主要-信息-处理后的图像:1044
2016-05-04 10:51:35,149-主要-信息-总时间:14.43秒
2016-05-04 10:51:35,149-主要-信息-平均时间:13.82ms
2016-05-04 10:51:35,149-主要-信息-加载时间:8.31秒
2016-05-04 10:51:35,149-主要-信息-平均加载时间:7.96ms
2016-05-04 10:51:35,149-主要-信息-分类时间:5.99秒
2016-05-04 10:51:35,149-主要-信息-平均分类时间:5.74毫秒

2016-05-04 10:51:20,721 - detector - INFO - data shape: (1, 1, 120, 160)
2016-05-04 10:51:35,149 - main - INFO - GPU timings:
2016-05-04 10:51:35,149 - main - INFO - processed images: 1044
2016-05-04 10:51:35,149 - main - INFO - total-time: 14.43s
2016-05-04 10:51:35,149 - main - INFO - avg-time: 13.82ms
2016-05-04 10:51:35,149 - main - INFO - load-time: 8.31s
2016-05-04 10:51:35,149 - main - INFO - avg-load-time: 7.96ms
2016-05-04 10:51:35,149 - main - INFO - classify-time: 5.99s
2016-05-04 10:51:35,149 - main - INFO - avg-classify-time: 5.74ms

批量大小:32

2016-05-04 10:52:30,773-检测器-信息-数据形状:(32,1,120,160)
2016-05-04 10:52:45,135-主要-信息-GPU计时:
2016-05-04 10:52:45,135-主要-信息-处理后的图像:1044
2016-05-04 10:52:45,135-主要-信息-总时间:14.36秒
2016-05-04 10:52:45,136-主要-信息-平均时间:13.76ms
2016-05-04 10:52:45,136-主要-信息-加载时间:7.13秒
2016-05-04 10:52:45,136-主要-信息-平均加载时间:6.83ms
2016-05-04 10:52:45,136-主要-信息-分类时间:7.13秒
2016-05-04 10:52:45,136-主要-信息-平均分类时间:6.83ms

2016-05-04 10:52:30,773 - detector - INFO - data shape: (32, 1, 120, 160)
2016-05-04 10:52:45,135 - main - INFO - GPU timings:
2016-05-04 10:52:45,135 - main - INFO - processed images: 1044
2016-05-04 10:52:45,135 - main - INFO - total-time: 14.36s
2016-05-04 10:52:45,136 - main - INFO - avg-time: 13.76ms
2016-05-04 10:52:45,136 - main - INFO - load-time: 7.13s
2016-05-04 10:52:45,136 - main - INFO - avg-load-time: 6.83ms
2016-05-04 10:52:45,136 - main - INFO - classify-time: 7.13s
2016-05-04 10:52:45,136 - main - INFO - avg-classify-time: 6.83ms

批量大小:128

2016-05-04 10:53:17,478-检测器-信息-数据形状:(128,1,120,160)
2016-05-04 10:53:31,299-主要-信息-GPU计时:
2016-05-04 10:53:31,299-主要-信息-处理后的图像:1044
2016-05-04 10:53:31,299-主要-信息-总时间:13.82秒
2016-05-04 10:53:31,299-主要-信息-平均时间:13.24ms
2016-05-04 10:53:31,299-主要-信息-加载时间:7.06秒
2016-05-04 10:53:31,299-主要-信息-平均加载时间:6.77ms
2016-05-04 10:53:31,299-主要-信息-分类时间:6.66秒
2016-05-04 10:53:31,299-主要-信息-平均分类时间:6.38ms

2016-05-04 10:53:17,478 - detector - INFO - data shape: (128, 1, 120, 160)
2016-05-04 10:53:31,299 - main - INFO - GPU timings:
2016-05-04 10:53:31,299 - main - INFO - processed images: 1044
2016-05-04 10:53:31,299 - main - INFO - total-time: 13.82s
2016-05-04 10:53:31,299 - main - INFO - avg-time: 13.24ms
2016-05-04 10:53:31,299 - main - INFO - load-time: 7.06s
2016-05-04 10:53:31,299 - main - INFO - avg-load-time: 6.77ms
2016-05-04 10:53:31,299 - main - INFO - classify-time: 6.66s
2016-05-04 10:53:31,299 - main - INFO - avg-classify-time: 6.38ms

批量大小:1024

2016-05-04 10:54:11,546-检测器-信息-数据形状:(1024、1、120、160)
2016-05-04 10:54:25,316-主要-信息-GPU计时:
2016-05-04 10:54:25,316-主要-信息-处理后的图像:1044
2016-05-04 10:54:25,316-主要-信息-总时间:13.77秒
2016-05-04 10:54:25,316-主要-信息-平均时间:13.19ms
2016-05-04 10:54:25,316-主要-信息-加载时间:7.04秒
2016-05-04 10:54:25,316-主要-信息-平均加载时间:6.75ms
2016-05-04 10:54:25,316-主要-信息-分类时间:6.63秒
2016-05-04 10:54:25,316-主要-信息-平均分类时间:6.35毫秒

2016-05-04 10:54:11,546 - detector - INFO - data shape: (1024, 1, 120, 160)
2016-05-04 10:54:25,316 - main - INFO - GPU timings:
2016-05-04 10:54:25,316 - main - INFO - processed images: 1044
2016-05-04 10:54:25,316 - main - INFO - total-time: 13.77s
2016-05-04 10:54:25,316 - main - INFO - avg-time: 13.19ms
2016-05-04 10:54:25,316 - main - INFO - load-time: 7.04s
2016-05-04 10:54:25,316 - main - INFO - avg-load-time: 6.75ms
2016-05-04 10:54:25,316 - main - INFO - classify-time: 6.63s
2016-05-04 10:54:25,316 - main - INFO - avg-classify-time: 6.35ms

推荐答案

我很确定问题出在

for j in range(len(images)):
net.blobs['data'].data[j,:,:,:] =   transformer.preprocess('data',images[j])
out = net.forward()['prob']

这样做将只是将来自for循环最后一次迭代的单个图像数据设置为网络的唯一输入.尝试先堆叠N图片(例如stackedimages),然后仅调用该行一次,例如

Doing this will simply set the single image data from the last iteration of the for loop as the network's only input. Try stacking the N images (say stackedimages) beforehand and calling the line only once e.g

for j in range(len(images)):
stackedimages <- transformer.preprocess('data',images[j])

然后打电话

net.blobs['data'].data[...] =   stackedimages

这篇关于批处理大小不适用于使用deploy.prototxt的caffe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆