在python lib中导入和裁剪jpeg的快速方法 [英] Fast way to import and crop a jpeg in python lib

查看:99
本文介绍了在python lib中导入和裁剪jpeg的快速方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个python应用程序,可以导入200k +图像,对其进行裁剪,然后将裁剪后的图像呈现给pyzbar以解释条形码.裁剪有帮助,因为图像上有多个条形码,如果给定较小的图像,则pyzbar可能会更快一些.

I have a python app that imports 200k+ images, crops them, and presents the cropped image to pyzbar to interpret a barcode. Cropping helps because there are multiple barcodes on the image and, presumably pyzbar is a little faster when given smaller images.

当前,我正在使用枕头导入和裁剪图像.

Currently I am using Pillow to import and crop the image.

平均导入和裁剪图像需要262毫秒,而pyzbar则需要8毫秒.

On the average importing and cropping an image takes 262 msecs and pyzbar take 8 msecs.

典型运行时间约为21小时.

A typical run is about 21 hours.

我想知道除Pillow之外的其他图书馆是否会在加载/裁剪方面提供实质性的改进.理想情况下,该库应可用于MacOS,但我也可以在虚拟Ubuntu计算机上运行整个库.

I wonder if a library other than Pillow might offer substantial improvements in loading/cropping. Ideally the library should be available for MacOS but I could also run the whole thing in a virtual Ubuntu machine.

我正在开发一个可以在并行进程中运行的版本,这将是一个很大的改进,但是如果我可以从另一个库中获得25%或更多的速度提升,我也将添加它.

I am working on a version that can run in parallel processes which will be a big improvement but if I could get 25% or more speed increase from a different library I would also add that.

推荐答案

由于您没有提供示例图像,因此我制作了一个大小为2544x4200的虚拟文件,大小为1.1MB,并在答案的末尾提供了该文件. .我制作了1000张该图像,并为每个基准测试处理了全部1000张图像.

As you didn't provide a sample image, I made a dummy file with dimensions 2544x4200 at 1.1MB in size and it is provided at the end of the answer. I made 1,000 copies of that image and processed all 1,000 images for each benchmark.

由于您只在注释区域中提供了代码,因此我将其接受,格式化并尽了最大努力.我还把它放在一个循环中,这样它就可以只调用一次Python解释器就可以处理许多文件-当您有20,000个文件时,这一点就变得很重要.

As you only gave your code in the comments area, I took it, formatted it and made the best I could of it. I also put it in a loop so it can process many files for just one invocation of the Python interpreter - this becomes important when you have 20,000 files.

看起来像这样:

#!/usr/bin/env python3

import sys
from PIL import Image

# Process all input files so we only incur Python startup overhead once
for filename in sys.argv[1:]:
   print(f'Processing: {filename}')
   imgc = Image.open(filename).crop((0, 150, 270, 1050))

我怀疑我可以使用以下方法使之更快:

My suspicion is that I can make that faster using:

  • GNU并行和/或
  • pyvips

这是您代码的pyvips版本:

#!/usr/bin/env python3

import sys
import pyvips
import numpy as np

# Process all input files so we only incur Python startup overhead once
for filename in sys.argv[1:]:
   print(f'Processing: {filename}')

   img = pyvips.Image.new_from_file(filename, access='sequential')
   roi = img.crop(0, 150, 270, 900)
   mem_img = roi.write_to_memory()

   # Make a numpy array from that buffer object
   nparr = np.ndarray(buffer=mem_img, dtype=np.uint8,
                   shape=[roi.height, roi.width, roi.bands])

以下是结果:

./orig.py bc*jpg
224 seconds, i.e. 224 ms per image, same as you

并行原始代码

parallel ./orig.py ::: bc*jpg
55 seconds

平行的原始代码,但传递尽可能多的文件名

parallel -X ./orig.py ::: bc*jpg
42 seconds   

顺序pyvps

./vipsversion bc*
30 seconds, i.e. 7x as fast as PIL which was 224 seconds

平行pyvips

parallel ./vipsversion ::: bc*
32 seconds

并行pyvips,但传递尽可能多的文件名

parallel -X ./vipsversion ::: bc*
5.2 seconds, i.e. this is the way to go :-)

请注意,您可以使用 homebrew 在MacOS上安装 GNU Parallel :

Note that you can install GNU Parallel on macOS with homebrew:

brew install parallel

这篇关于在python lib中导入和裁剪jpeg的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆