使用Tesseract 4-来自uwsgi-nginx-flask-docker的Docker容器 [英] Use Tesseract 4 - Docker Container from uwsgi-nginx-flask-docker

查看:229
本文介绍了使用Tesseract 4-来自uwsgi-nginx-flask-docker的Docker容器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的python项目在本地运行,并且可以运行.我将python中的tesseract与subprocess包一起使用.

然后我部署了项目,并且由于我使用了Flask,因此我安装了 tiangolo-uwsgi- flask-nginx-docker ,但是,此处未安装Tesseract.这就是为什么我的项目因为找不到tesseract而不再起作用.而且它无法识别在我的AWS实例上安装的tesseract,因为tesseract未安装在docker容器中.

这就是为什么我还要使用 tesseract 4 Docker 的原因Tesseract的装置.

我有两个Docker:

c82b61361992        tesseractshadow/tesseract4re:latest   "/bin/bash"            6 seconds ago       Up 5 seconds                                      t4re
e122633ef81c        my_project:latest                 "/entrypoint.sh /sta   35 minutes ago      Up 35 minutes       0.0.0.0:80->80/tcp, 443/tcp   modest_perlman

但是我不知道如何告诉my_project它必须从Tesseract容器中取出Tesseract.

我阅读了这篇文章关于如何连接两个Docker容器,但是我迷路了. :)

我看到Tesseract Docker应该这样工作:

#!/bin/bash
docker ps -f name=t4re
TASK_TMP_DIR=TASK_$$_$(date +"%N")
echo "====== TASK $TASK_TMP_DIR started ======"
docker exec -it t4re mkdir \-p ./$TASK_TMP_DIR/
docker cp ./ocr-files/phototest.tif t4re:/home/work/$TASK_TMP_DIR/
docker exec -it t4re /bin/bash -c "mkdir -p ./$TASK_TMP_DIR/out/; cd ./$TASK_TMP_DIR/out/; tesseract ../phototest.tif phototest -l eng --psm 1 --oem 2 txt pdf hocr"
mkdir -p ./ocr-files/output/$TASK_TMP_DIR/
docker cp t4re:/home/work/$TASK_TMP_DIR/out/ ./ocr-files/output/$TASK_TMP_DIR/
docker exec -it t4re rm \-r ./$TASK_TMP_DIR/
docker exec -it t4re ls
echo "====== Result files was copied to ./ocr-files/output/$TASK_TMP_DIR/ ======"

但是我不知道如何在我的python脚本和其他容器中实现它.

我的python-tesseract脚本看起来非常类似于 pytesseract.py 我只是更改了几行,并删除了一些我不需要的东西.

也许有人知道该怎么做,或者可以提出另一种更好的方法来将tesseract与tiangolo-docker

一起使用

解决方案

编辑 (请参见下面的修改)

我找到了答案.由于它适用于每两个docker容器,因此我将编写一个可以始终使用的通用解决方案.

我在同一实例中同时拥有docker映像和容器:

CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS                    NAMES
14524d364cff        (image)               "java -jar ..."   40 hours ago        Up 40 hours         0.0.0.0:5000->5000/tcp   api-1
3392994ae3ac        (image)               "java -jar ..."   40 hours ago        Up 40 hours         0.0.0.0:5002->5002/tcp   api-2

直到这里很容易.

然后,我写了一个docker-compose.yml

version: '2'
services:         
  api-1:
    image: _name-of-image_
    container_name: api-1
    ports:
      - "5000:5000"
    depends_on:
      - api-2

  api-2:
    image: _name-of-image_
    container_name: api-2
    ports:
      - "5002:5002"

然后,例如,在api-1的docker文件中.

...
ENV API-2HOST api-2
...

就是这样.

在我的特定情况下,我有一个api-1.conf,其中::

accounts = {
  http = {
    host = "localhost"
    host = ${?API-2HOST}
    port = 5002
    poolBufferSize = 100
    routes = {
      authentication = "/authentication"
      login = "/login/"
      logout = "/logout"
      refreshTokens = "/refreshTokens"
    }
  }
}

,然后我可以轻松地在该处发出请求,因此两个docker容器也可以通信.

希望它可以帮助某人.

编辑

因为它可能很复杂,所以我创建了一个仅带有dockerfile的git项目,您可以在其中使用flask,nginx,uwsgi和tesseract.因此,无需同时使用两个容器.

docker-flask-nginx-uwsgi-tesseract

I had my python project running local, and it works. I use tesseract from python with the subprocess package.

Then I deployed my project and since I use Flask, I installed tiangolo-uwsgi-flask-nginx-docker but, Tesseract isn't installed there. That's why my project doesn't work anymore because it cannot find tesseract. And it doesn't recognize the tesseract that is installed on my AWS instance because tesseract isn't installed in the docker container.

That's why I would like to use also tesseract 4 Docker which has an installation of Tesseract.

I have both Dockers:

c82b61361992        tesseractshadow/tesseract4re:latest   "/bin/bash"            6 seconds ago       Up 5 seconds                                      t4re
e122633ef81c        my_project:latest                 "/entrypoint.sh /sta   35 minutes ago      Up 35 minutes       0.0.0.0:80->80/tcp, 443/tcp   modest_perlman

But I don't know how to tell my_projectthat it has to take Tesseract from the Tesseract Container.

I read this post about connecting two Docker containers, but I get even more lost. :)

I saw that the Tesseract Docker should work this way:

#!/bin/bash
docker ps -f name=t4re
TASK_TMP_DIR=TASK_$$_$(date +"%N")
echo "====== TASK $TASK_TMP_DIR started ======"
docker exec -it t4re mkdir \-p ./$TASK_TMP_DIR/
docker cp ./ocr-files/phototest.tif t4re:/home/work/$TASK_TMP_DIR/
docker exec -it t4re /bin/bash -c "mkdir -p ./$TASK_TMP_DIR/out/; cd ./$TASK_TMP_DIR/out/; tesseract ../phototest.tif phototest -l eng --psm 1 --oem 2 txt pdf hocr"
mkdir -p ./ocr-files/output/$TASK_TMP_DIR/
docker cp t4re:/home/work/$TASK_TMP_DIR/out/ ./ocr-files/output/$TASK_TMP_DIR/
docker exec -it t4re rm \-r ./$TASK_TMP_DIR/
docker exec -it t4re ls
echo "====== Result files was copied to ./ocr-files/output/$TASK_TMP_DIR/ ======"

But I've no clue, how to implement it in my python script and from the other container.

My python-tesseract script looks quite similar to pytesseract.py I just changed a few lines and deleted some stuff I don't need.

Maybe someone knows how to do this, or could propose another better way to use tesseract with the tiangolo-docker

解决方案

EDIT (See the edit below)

I found the answer. Since it would work for every two docker containers, I'm gonna write a general solution which one can always use.

I have both docker images and containers in the same instance:

CONTAINER ID        IMAGE                 COMMAND             CREATED             STATUS              PORTS                    NAMES
14524d364cff        (image)               "java -jar ..."   40 hours ago        Up 40 hours         0.0.0.0:5000->5000/tcp   api-1
3392994ae3ac        (image)               "java -jar ..."   40 hours ago        Up 40 hours         0.0.0.0:5002->5002/tcp   api-2

Until here it's easy.

Then, I wrote a docker-compose.yml

version: '2'
services:         
  api-1:
    image: _name-of-image_
    container_name: api-1
    ports:
      - "5000:5000"
    depends_on:
      - api-2

  api-2:
    image: _name-of-image_
    container_name: api-2
    ports:
      - "5002:5002"

Then, in the docker file of api-1, for example.

...
ENV API-2HOST api-2
...

and that's it.

In my particular case, I have an api-1.conf with:

accounts = {
  http = {
    host = "localhost"
    host = ${?API-2HOST}
    port = 5002
    poolBufferSize = 100
    routes = {
      authentication = "/authentication"
      login = "/login/"
      logout = "/logout"
      refreshTokens = "/refreshTokens"
    }
  }
}

and then I can easily make a request there and so are both docker containers communicated.

Hope it can help someone.

EDIT

Since it can be complicated, I created a git project with just a dockerfile where you can use flask, nginx, uwsgi and tesseract. So there's no need to use both containers.

docker-flask-nginx-uwsgi-tesseract

这篇关于使用Tesseract 4-来自uwsgi-nginx-flask-docker的Docker容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆