批处理大小不适用于使用deploy.prototxt的caffe [英] batch size does not work for caffe with deploy.prototxt
问题描述
我正在尝试加快分类过程.我想增加我的deploy.prototxt文件中的第一个input_dim,但这似乎不起作用.它甚至比一张一张一张地分类要慢一些.
I'm trying to make my classification process a bit faster. I thought of increasing the first input_dim in my deploy.prototxt but that does not seem to work. It's even a little bit slower than classifying each image one by one.
deploy.prototxt
deploy.prototxt
input: "data"
input_dim: 128
input_dim: 1
input_dim: 120
input_dim: 160
... net description ...
python网络初始化
python net initialization
net=caffe.Net( 'deploy.prototxt', 'model.caffemodel', caffe.TEST)
net.blobs['data'].reshape(128, 1, 120, 160)
transformer = caffe.io.Transformer({'data':net.blobs['data'].data.shape})
#transformer settings
python分类
images=[None]*128
for i in range(len(images)):
images[i]=caffe.io.load_image('image_path', False)
for j in range(len(images)):
net.blobs['data'].data[j,:,:,:] = transformer.preprocess('data',images[j])
out = net.forward()['prob']
我跳过了一些细节,但是重要的事情应该给出.我尝试了不同的批处理大小,例如32、64,...,1024,但几乎都相同.所以我的问题是,如果有人知道我做错了什么或需要更改什么? 感谢您的帮助!
I skipped some details, but the important stuff should be given. I tried different batch size, like 32, 64, ..., 1024 but all nearly the same. So my question is, if someone has an idea what I'm doing wrong or what needs to be changed? Thanks for help!
一些计时结果,平均时间就是处理后的图像确定的总时间(1044).
Some timing results, the avg-times are just the total-times devided by the processed images(1044).
批量大小:1
2016-05-04 10:51:20,721-检测器-信息-数据形状:(1、1、120、160)
2016-05-04 10:51:35,149-主要-信息-GPU计时:
2016-05-04 10:51:35,149-主要-信息-处理后的图像:1044
2016-05-04 10:51:35,149-主要-信息-总时间:14.43秒
2016-05-04 10:51:35,149-主要-信息-平均时间:13.82ms
2016-05-04 10:51:35,149-主要-信息-加载时间:8.31秒
2016-05-04 10:51:35,149-主要-信息-平均加载时间:7.96ms
2016-05-04 10:51:35,149-主要-信息-分类时间:5.99秒
2016-05-04 10:51:35,149-主要-信息-平均分类时间:5.74毫秒
2016-05-04 10:51:20,721 - detector - INFO - data shape: (1, 1, 120, 160)
2016-05-04 10:51:35,149 - main - INFO - GPU timings:
2016-05-04 10:51:35,149 - main - INFO - processed images: 1044
2016-05-04 10:51:35,149 - main - INFO - total-time: 14.43s
2016-05-04 10:51:35,149 - main - INFO - avg-time: 13.82ms
2016-05-04 10:51:35,149 - main - INFO - load-time: 8.31s
2016-05-04 10:51:35,149 - main - INFO - avg-load-time: 7.96ms
2016-05-04 10:51:35,149 - main - INFO - classify-time: 5.99s
2016-05-04 10:51:35,149 - main - INFO - avg-classify-time: 5.74ms
批量大小:32
2016-05-04 10:52:30,773-检测器-信息-数据形状:(32,1,120,160)
2016-05-04 10:52:45,135-主要-信息-GPU计时:
2016-05-04 10:52:45,135-主要-信息-处理后的图像:1044
2016-05-04 10:52:45,135-主要-信息-总时间:14.36秒
2016-05-04 10:52:45,136-主要-信息-平均时间:13.76ms
2016-05-04 10:52:45,136-主要-信息-加载时间:7.13秒
2016-05-04 10:52:45,136-主要-信息-平均加载时间:6.83ms
2016-05-04 10:52:45,136-主要-信息-分类时间:7.13秒
2016-05-04 10:52:45,136-主要-信息-平均分类时间:6.83ms
2016-05-04 10:52:30,773 - detector - INFO - data shape: (32, 1, 120, 160)
2016-05-04 10:52:45,135 - main - INFO - GPU timings:
2016-05-04 10:52:45,135 - main - INFO - processed images: 1044
2016-05-04 10:52:45,135 - main - INFO - total-time: 14.36s
2016-05-04 10:52:45,136 - main - INFO - avg-time: 13.76ms
2016-05-04 10:52:45,136 - main - INFO - load-time: 7.13s
2016-05-04 10:52:45,136 - main - INFO - avg-load-time: 6.83ms
2016-05-04 10:52:45,136 - main - INFO - classify-time: 7.13s
2016-05-04 10:52:45,136 - main - INFO - avg-classify-time: 6.83ms
批量大小:128
2016-05-04 10:53:17,478-检测器-信息-数据形状:(128,1,120,160)
2016-05-04 10:53:31,299-主要-信息-GPU计时:
2016-05-04 10:53:31,299-主要-信息-处理后的图像:1044
2016-05-04 10:53:31,299-主要-信息-总时间:13.82秒
2016-05-04 10:53:31,299-主要-信息-平均时间:13.24ms
2016-05-04 10:53:31,299-主要-信息-加载时间:7.06秒
2016-05-04 10:53:31,299-主要-信息-平均加载时间:6.77ms
2016-05-04 10:53:31,299-主要-信息-分类时间:6.66秒
2016-05-04 10:53:31,299-主要-信息-平均分类时间:6.38ms
2016-05-04 10:53:17,478 - detector - INFO - data shape: (128, 1, 120, 160)
2016-05-04 10:53:31,299 - main - INFO - GPU timings:
2016-05-04 10:53:31,299 - main - INFO - processed images: 1044
2016-05-04 10:53:31,299 - main - INFO - total-time: 13.82s
2016-05-04 10:53:31,299 - main - INFO - avg-time: 13.24ms
2016-05-04 10:53:31,299 - main - INFO - load-time: 7.06s
2016-05-04 10:53:31,299 - main - INFO - avg-load-time: 6.77ms
2016-05-04 10:53:31,299 - main - INFO - classify-time: 6.66s
2016-05-04 10:53:31,299 - main - INFO - avg-classify-time: 6.38ms
批量大小:1024
2016-05-04 10:54:11,546-检测器-信息-数据形状:(1024、1、120、160)
2016-05-04 10:54:25,316-主要-信息-GPU计时:
2016-05-04 10:54:25,316-主要-信息-处理后的图像:1044
2016-05-04 10:54:25,316-主要-信息-总时间:13.77秒
2016-05-04 10:54:25,316-主要-信息-平均时间:13.19ms
2016-05-04 10:54:25,316-主要-信息-加载时间:7.04秒
2016-05-04 10:54:25,316-主要-信息-平均加载时间:6.75ms
2016-05-04 10:54:25,316-主要-信息-分类时间:6.63秒
2016-05-04 10:54:25,316-主要-信息-平均分类时间:6.35毫秒
2016-05-04 10:54:11,546 - detector - INFO - data shape: (1024, 1, 120, 160)
2016-05-04 10:54:25,316 - main - INFO - GPU timings:
2016-05-04 10:54:25,316 - main - INFO - processed images: 1044
2016-05-04 10:54:25,316 - main - INFO - total-time: 13.77s
2016-05-04 10:54:25,316 - main - INFO - avg-time: 13.19ms
2016-05-04 10:54:25,316 - main - INFO - load-time: 7.04s
2016-05-04 10:54:25,316 - main - INFO - avg-load-time: 6.75ms
2016-05-04 10:54:25,316 - main - INFO - classify-time: 6.63s
2016-05-04 10:54:25,316 - main - INFO - avg-classify-time: 6.35ms
推荐答案
我很确定问题出在
for j in range(len(images)):
net.blobs['data'].data[j,:,:,:] = transformer.preprocess('data',images[j])
out = net.forward()['prob']
这样做将只是将来自for循环最后一次迭代的单个图像数据设置为网络的唯一输入.尝试先堆叠N
图片(例如stackedimages
),然后仅调用该行一次,例如
Doing this will simply set the single image data from the last iteration of the for loop as the network's only input. Try stacking the N
images (say stackedimages
) beforehand and calling the line only once e.g
for j in range(len(images)):
stackedimages <- transformer.preprocess('data',images[j])
然后打电话
net.blobs['data'].data[...] = stackedimages
这篇关于批处理大小不适用于使用deploy.prototxt的caffe的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!