与多处理一起使用时，PyTesseract调用的工作非常缓慢 [英] PyTesseract call working very slow when used along with multiprocessing

查看：289 发布时间：2020/5/13 19:39:40 python multiprocessing tesseract pathos python-tesseract

本文介绍了与多处理一起使用时，PyTesseract调用的工作非常缓慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个函数，可以在将OCR应用于图像后，获取图像列表并在列表中生成输出.我还有另一个功能，可以使用多重处理来控制此功能的输入.因此，当我只有一个列表(即没有多重处理)时，列表中的每个图像大约要花费1s，但是当我将必须并行处理的列表增加到4时，每个图像却要花费惊人的13s.

I've a function that takes in a list of images and produces the output, in a list, after applying OCR to the image. I have an another function that controls the input to this function, by using multiprocessing. So, when I have a single list (i.e. no multiprocessing), each image of the list took ~ 1s, but when I increased the lists that had to be processed parallely to 4, each image took an astounding 13s.

要了解问题的真正出处，我尝试创建该问题的最小限度的工作示例.在这里，我有两个函数eat25和eat100，它们打开图像name并将其提供给使用API pytesseract的OCR. eat25执行25次，eat100执行100次.

To understand where the problem really is, I tried to create a minimal working example of the problem. Here I have two functions eat25 and eat100 which open an image name and feed it to the OCR, that uses the API pytesseract. eat25 does it 25 times, and eat100 does it 100 times.

我的目的是在没有多处理的情况下运行eat100，在有多处理(具有4个进程)的情况下运行eat25.从理论上讲，如果我有4个单独的处理器(我有2个内核，每个内核有2个线程，则CPU = 4(如果我错了，请更正我))，这应该比eat100少4倍的时间.

My aim here is to run eat100 without multiprocessing, and eat25 with multiprocessing (with 4 processes). This, theoretically, should take 4 times less time that eat100 if I have 4 separate processors (I have 2 cores with 2 threads per core, thus CPU(s) = 4 (correct me if I'm wrong here)).

但是当我看到代码甚至在打印4次正在处理0"之后都没有响应时，所有的理论都浪费了.单处理器功能eat100可以正常工作.

But all theory laid wasted when I saw that the code didn't even respond after printing "Processing 0" 4 times. The single processor function eat100 worked fine though.

我已经测试了一个简单的范围求值功能，并且它在多处理中确实能很好地工作，因此我的处理器肯定能很好地工作.唯一的罪魁祸首可能是:

I had tested a simple range cubing function, and it did work well with multiprocessing, so my processors do work well for sure. The only culprits here could be:

pytesseract:请参见此
错误代码?我做错了什么.

pytesseract: See this
Bad code? Something I am not doing right.

from pathos.multiprocessing import ProcessingPool
from time import time 
from PIL import Image
import pytesseract as pt
def eat25(name):
    for i in range(25):
        print('Processing :'+str(i))
        pt.image_to_string(Image.open(name),lang='hin+eng',config='--psm 6')
def eat100(name):
    for i in range(100):
        print('Processing :'+str(i))
        pt.image_to_string(Image.open(name),lang='hin+eng',config='--psm 6')
st = time()
eat100('normalBox.tiff')
en = time()
print('Direct :'+str(en-st))
#Using pathos
def caller():
    pool = ProcessingPool()
    pool.map(eat25,['normalBox.tiff','normalBox.tiff','normalBox.tiff','normalBox.tiff'])
if (__name__=='__main__'):
    caller()
en2 = time()

print('Pathos :'+str(en2-en))

那么，问题出在哪里呢?感谢您的帮助！

So, where the problem really is? Any help is appreciated!

可以在此处中找到图像normalBox.tiff.如果人们能重现代码并检查问题是否仍然存在，我将感到很高兴.

The image normalBox.tiff can be found here. I would be glad if people reproduce the code and check if the problem continues.

与多处理一起使用时，PyTesseract调用的工作非常缓慢 [英] PyTesseract call working very slow when used along with multiprocessing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

与多处理一起使用时，PyTesseract调用的工作非常缓慢 [英] PyTesseract call working very slow when used along with multiprocessing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭