在Docker中进行容器化时出现TesseractNotFound问题 [英] TesseractNotFound issue when containerizing in docker

查看:101
本文介绍了在Docker中进行容器化时出现TesseractNotFound问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:

我在本地计算机上安装了 tesseract ,其路径位于/usr/local/Cellar/tesseract/4.1.1/bin/tesseract .一切工作正常,直到我在docker中将其容器化并显示以下错误消息: pytesseract.pytesseract.TesseractNotFoundError:尚未安装或不是您的PATH

I had tesseract installed in local machine and its path is at /usr/local/Cellar/tesseract/4.1.1/bin/tesseract. Everything works perfectly until I containerized it in docker with error message as: pytesseract.pytesseract.TesseractNotFoundError: is not installed or it's not your PATH

我尝试过的事情:

根据错误消息,这是我尝试过的操作:

Based on the error message, this is what I've tried:

1).在文件共享下的docker桌面应用程序中将PATH添加到/usr/local 并将文件路径从本地挂载到docker-仍然收到错误消息(不起作用)

1). Add PATH in docker desktop app under file sharing to /usr/local and mount the file path from local to docker - still getting the error message (doesn't work)

2).将 tesseract.exe 从其驻留位置移动到当前本地工作目录-仍然收到错误消息(当然,它不起作用-那时我还在想什么?)

2). Move tesseract.exe from where it resides to current local working dir - still getting the error message(of course it doesn't work - what was I even thinking back then?)

3).修改dockerfile以安装带有其依赖项的tesseract.这是dockerfile:

3). Modify dockerfile to install tesseract with its dependencies. Here is the dockerfile:

FROM python:3.7-alpine
RUN apk update && apk add --no-cache tesseract-ocr
WORKDIR /app
COPY ./requirements.txt ./ 
RUN pip3 install --upgrade pip
# install dependencies 
RUN pip3 install -r requirements.txt
RUN pip3 install --upgrade PyMuPDF
# bundle app source 
COPY . /app

COPY ./ChaseOCR.py /app
COPY ./BancAmericaOCR.py /app
COPY ./WellsFargoOCR.py /app

EXPOSE 8080

CMD ["python3", "MainBankClass.py"] 

在requirements.txt文件中,还包括 pytesseract tesseract 依赖项.-仍然收到错误消息(不起作用).在过去的两天里一直被困在这个问题上,这里的选择有些用完了.此链接

Under requirements.txt file, pytesseract and tesseract dependencies are also included. - still getting the error message (doesn't work). Being stuck on this issue in the past 2 days and kinda running out of options here. This link and this link both don't work on my case. Any help is much appreciated. Thanks in advance.

感谢Neo的解决方案,我现在正在对其进行测试,但是其运行非常缓慢.因此,我认为最好在这里共享requirements.txt文件,以防其他问题与tesseract不相关.

Thanks to Neo's solution and I am testing it now but its running very slowly. Thus I thought it would be better to share requirements.txt file here just in case other issues are non-related to tesseract.

requirements.txt:

numpy
pandas
opencv-python
Pillow
Image
pytesseract
tesseract
PyMuPDF
python-levenshtein
tabula-py

本地文件目录:

testdockerfile
├─ .vscode
│  └─ settings.json
├─ BankofAmericaOCR.py
├─ ChaseOCR.py
├─ Dockerfile
├─ MainBankClass.py
|- __init__.py
├─ WellsFargoOCR.py
└─ requirements.txt

如果有人遇到与在docker中实现 tesseract 后仍然遇到 TesseractNotFound 问题相同的问题,以供将来参考.您需要做的是注释掉 pytesseract.pytesseract.tesseract_cmd = r'/path/to/your/tesseract (如果您设置了在本地运行的路径).之后,您还需要重新构建映像并在docker中运行该映像.没关系.

Just for future reference if anyone has the same issue as I did after implementing tesseract in docker and still getting TesseractNotFound issue. What you need to do is to comment out pytesseract.pytesseract.tesseract_cmd = r'/path/to/your/tesseract if you set the path to run it locally. After that, you also need to re-build the image and run that image in docker. It should be fine.

推荐答案


requirements.txt 中的某些python软件包具有其他先决条件.有了这个 Dockerfile ,它成功完成了整个构建过程.

Edit 3:
Some of the python packages in requirements.txt have other prerequisites. With this Dockerfile it went successfully through the entire build process.

最棘手的部分是构建 opencv .
代金券到
https://github.com/janza/docker-python3-opencv/blob/master/Dockerfile

The trickiest part was to build opencv.
Credits to https://github.com/janza/docker-python3-opencv/blob/master/Dockerfile

.
├── Dockerfile
└── requirements.txt

Dockerfile:

Dockerfile:

FROM python:3.7

RUN apt-get update \
    && apt-get install -y \
        build-essential \
        cmake \
        git \
        wget \
        unzip \
        yasm \
        pkg-config \
        libswscale-dev \
        libtbb2 \
        libtbb-dev \
        libjpeg-dev \
        libpng-dev \
        libtiff-dev \
        libavformat-dev \
        libpq-dev \
    && rm -rf /var/lib/apt/lists/*

RUN pip install numpy

WORKDIR /
ENV OPENCV_VERSION="4.1.1"
RUN wget https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip \
&& unzip ${OPENCV_VERSION}.zip \
&& mkdir /opencv-${OPENCV_VERSION}/cmake_binary \
&& cd /opencv-${OPENCV_VERSION}/cmake_binary \
&& cmake -DBUILD_TIFF=ON \
  -DBUILD_opencv_java=OFF \
  -DWITH_CUDA=OFF \
  -DWITH_OPENGL=ON \
  -DWITH_OPENCL=ON \
  -DWITH_IPP=ON \
  -DWITH_TBB=ON \
  -DWITH_EIGEN=ON \
  -DWITH_V4L=ON \
  -DBUILD_TESTS=OFF \
  -DBUILD_PERF_TESTS=OFF \
  -DCMAKE_BUILD_TYPE=RELEASE \
  -DCMAKE_INSTALL_PREFIX=$(python3.7 -c "import sys; print(sys.prefix)") \
  -DPYTHON_EXECUTABLE=$(which python3.7) \
  -DPYTHON_INCLUDE_DIR=$(python3.7 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
  -DPYTHON_PACKAGES_PATH=$(python3.7 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \
  .. \
&& make install \
&& rm /${OPENCV_VERSION}.zip \
&& rm -r /opencv-${OPENCV_VERSION}
RUN ln -s \
  /usr/local/python/cv2/python-3.7/cv2.cpython-37m-x86_64-linux-gnu.so \
  /usr/local/lib/python3.7/site-packages/cv2.so

RUN apt-get --fix-missing update && apt-get --fix-broken install && apt-get install -y poppler-utils && apt-get install -y tesseract-ocr && \
    apt-get install -y libtesseract-dev && apt-get install -y libleptonica-dev && ldconfig && apt install -y libsm6 libxext6 && apt install -y python-opencv

COPY ./requirements.txt ./ 
RUN pip3 install --upgrade pip
# install dependencies 
RUN pip3 install -r requirements.txt

内部版本:

docker image build -t my-awesome-py .

运行:

docker run --rm my-awesome-py tesseract
Usage:
  tesseract --help | --help-extra | --version
  tesseract --list-langs
  tesseract imagename outputbase [options...] [configfile...]

OCR options:
  -l LANG[+LANG]        Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.

Single options:
  --help                Show this help message.
  --help-extra          Show extra help for advanced users.
  --version             Show version information.
  --list-langs          List available languages for tesseract engine.

这篇关于在Docker中进行容器化时出现TesseractNotFound问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆