在python lib中导入和裁剪jpeg的快速方法 [英] Fast way to import and crop a jpeg in python lib

查看：99 发布时间：2020/11/27 2:34:56 python image-processing jpeg

本文介绍了在python lib中导入和裁剪jpeg的快速方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个python应用程序，可以导入200k +图像，对其进行裁剪，然后将裁剪后的图像呈现给pyzbar以解释条形码.裁剪有帮助，因为图像上有多个条形码，如果给定较小的图像，则pyzbar可能会更快一些.

I have a python app that imports 200k+ images, crops them, and presents the cropped image to pyzbar to interpret a barcode. Cropping helps because there are multiple barcodes on the image and, presumably pyzbar is a little faster when given smaller images.

当前，我正在使用枕头导入和裁剪图像.

Currently I am using Pillow to import and crop the image.

平均导入和裁剪图像需要262毫秒，而pyzbar则需要8毫秒.

On the average importing and cropping an image takes 262 msecs and pyzbar take 8 msecs.

典型运行时间约为21小时.

A typical run is about 21 hours.

我想知道除Pillow之外的其他图书馆是否会在加载/裁剪方面提供实质性的改进.理想情况下，该库应可用于MacOS，但我也可以在虚拟Ubuntu计算机上运行整个库.

I wonder if a library other than Pillow might offer substantial improvements in loading/cropping. Ideally the library should be available for MacOS but I could also run the whole thing in a virtual Ubuntu machine.

我正在开发一个可以在并行进程中运行的版本，这将是一个很大的改进，但是如果我可以从另一个库中获得25％或更多的速度提升，我也将添加它.

I am working on a version that can run in parallel processes which will be a big improvement but if I could get 25% or more speed increase from a different library I would also add that.

推荐答案

由于您没有提供示例图像，因此我制作了一个大小为2544x4200的虚拟文件，大小为1.1MB，并在答案的末尾提供了该文件. .我制作了1000张该图像，并为每个基准测试处理了全部1000张图像.

As you didn't provide a sample image, I made a dummy file with dimensions 2544x4200 at 1.1MB in size and it is provided at the end of the answer. I made 1,000 copies of that image and processed all 1,000 images for each benchmark.

由于您只在注释区域中提供了代码，因此我将其接受，格式化并尽了最大努力.我还把它放在一个循环中，这样它就可以只调用一次Python解释器就可以处理许多文件-当您有20,000个文件时，这一点就变得很重要.

As you only gave your code in the comments area, I took it, formatted it and made the best I could of it. I also put it in a loop so it can process many files for just one invocation of the Python interpreter - this becomes important when you have 20,000 files.

看起来像这样:

#!/usr/bin/env python3

import sys
from PIL import Image

# Process all input files so we only incur Python startup overhead once
for filename in sys.argv[1:]:
   print(f'Processing: {filename}')
   imgc = Image.open(filename).crop((0, 150, 270, 1050))

我怀疑我可以使用以下方法使之更快:

My suspicion is that I can make that faster using:

GNU并行和/或
pyvips

这是您代码的pyvips版本:

#!/usr/bin/env python3

import sys
import pyvips
import numpy as np

# Process all input files so we only incur Python startup overhead once
for filename in sys.argv[1:]:
   print(f'Processing: {filename}')

   img = pyvips.Image.new_from_file(filename, access='sequential')
   roi = img.crop(0, 150, 270, 900)
   mem_img = roi.write_to_memory()

   # Make a numpy array from that buffer object
   nparr = np.ndarray(buffer=mem_img, dtype=np.uint8,
                   shape=[roi.height, roi.width, roi.bands])

以下是结果:

./orig.py bc*jpg
224 seconds, i.e. 224 ms per image, same as you

并行原始代码

parallel ./orig.py ::: bc*jpg
55 seconds

平行的原始代码，但传递尽可能多的文件名

parallel -X ./orig.py ::: bc*jpg
42 seconds

顺序pyvps

./vipsversion bc*
30 seconds, i.e. 7x as fast as PIL which was 224 seconds

平行pyvips

parallel ./vipsversion ::: bc*
32 seconds

并行pyvips，但传递尽可能多的文件名

parallel -X ./vipsversion ::: bc*
5.2 seconds, i.e. this is the way to go :-)

请注意，您可以使用 homebrew 在MacOS上安装 GNU Parallel :

Note that you can install GNU Parallel on macOS with homebrew:

brew install parallel

这篇关于在python lib中导入和裁剪jpeg的快速方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在python lib中导入和裁剪jpeg的快速方法 [英] Fast way to import and crop a jpeg in python lib

问题描述

推荐答案

并行原始代码

平行的原始代码，但传递尽可能多的文件名

顺序pyvps

平行pyvips

并行pyvips，但传递尽可能多的文件名

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在python lib中导入和裁剪jpeg的快速方法 [英] Fast way to import and crop a jpeg in python lib

问题描述

推荐答案

并行原始代码

平行的原始代码，但传递尽可能多的文件名

顺序pyvps

平行pyvips

并行pyvips，但传递尽可能多的文件名

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭