在python lib中导入和裁剪jpeg的快速方法 [英] Fast way to import and crop a jpeg in python lib
问题描述
我有一个python应用程序,可以导入200k +图像,对其进行裁剪,然后将裁剪后的图像呈现给pyzbar以解释条形码.裁剪有帮助,因为图像上有多个条形码,如果给定较小的图像,则pyzbar可能会更快一些.
I have a python app that imports 200k+ images, crops them, and presents the cropped image to pyzbar to interpret a barcode. Cropping helps because there are multiple barcodes on the image and, presumably pyzbar is a little faster when given smaller images.
当前,我正在使用枕头导入和裁剪图像.
Currently I am using Pillow to import and crop the image.
平均导入和裁剪图像需要262毫秒,而pyzbar则需要8毫秒.
On the average importing and cropping an image takes 262 msecs and pyzbar take 8 msecs.
典型运行时间约为21小时.
A typical run is about 21 hours.
我想知道除Pillow之外的其他图书馆是否会在加载/裁剪方面提供实质性的改进.理想情况下,该库应可用于MacOS,但我也可以在虚拟Ubuntu计算机上运行整个库.
I wonder if a library other than Pillow might offer substantial improvements in loading/cropping. Ideally the library should be available for MacOS but I could also run the whole thing in a virtual Ubuntu machine.
我正在开发一个可以在并行进程中运行的版本,这将是一个很大的改进,但是如果我可以从另一个库中获得25%或更多的速度提升,我也将添加它.
I am working on a version that can run in parallel processes which will be a big improvement but if I could get 25% or more speed increase from a different library I would also add that.
推荐答案
由于您没有提供示例图像,因此我制作了一个大小为2544x4200的虚拟文件,大小为1.1MB,并在答案的末尾提供了该文件. .我制作了1000张该图像,并为每个基准测试处理了全部1000张图像.
As you didn't provide a sample image, I made a dummy file with dimensions 2544x4200 at 1.1MB in size and it is provided at the end of the answer. I made 1,000 copies of that image and processed all 1,000 images for each benchmark.
由于您只在注释区域中提供了代码,因此我将其接受,格式化并尽了最大努力.我还把它放在一个循环中,这样它就可以只调用一次Python解释器就可以处理许多文件-当您有20,000个文件时,这一点就变得很重要.
As you only gave your code in the comments area, I took it, formatted it and made the best I could of it. I also put it in a loop so it can process many files for just one invocation of the Python interpreter - this becomes important when you have 20,000 files.
看起来像这样:
#!/usr/bin/env python3
import sys
from PIL import Image
# Process all input files so we only incur Python startup overhead once
for filename in sys.argv[1:]:
print(f'Processing: {filename}')
imgc = Image.open(filename).crop((0, 150, 270, 1050))
我怀疑我可以使用以下方法使之更快:
My suspicion is that I can make that faster using:
- GNU并行和/或
- pyvips
这是您代码的pyvips
版本:
#!/usr/bin/env python3
import sys
import pyvips
import numpy as np
# Process all input files so we only incur Python startup overhead once
for filename in sys.argv[1:]:
print(f'Processing: {filename}')
img = pyvips.Image.new_from_file(filename, access='sequential')
roi = img.crop(0, 150, 270, 900)
mem_img = roi.write_to_memory()
# Make a numpy array from that buffer object
nparr = np.ndarray(buffer=mem_img, dtype=np.uint8,
shape=[roi.height, roi.width, roi.bands])
以下是结果:
./orig.py bc*jpg
224 seconds, i.e. 224 ms per image, same as you
并行原始代码
parallel ./orig.py ::: bc*jpg
55 seconds
平行的原始代码,但传递尽可能多的文件名
parallel -X ./orig.py ::: bc*jpg
42 seconds
顺序pyvps
./vipsversion bc*
30 seconds, i.e. 7x as fast as PIL which was 224 seconds
平行pyvips
parallel ./vipsversion ::: bc*
32 seconds
并行pyvips,但传递尽可能多的文件名
parallel -X ./vipsversion ::: bc*
5.2 seconds, i.e. this is the way to go :-)
请注意,您可以使用 homebrew 在MacOS上安装 GNU Parallel :
Note that you can install GNU Parallel on macOS with homebrew:
brew install parallel
这篇关于在python lib中导入和裁剪jpeg的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!