使用Tesseract 4-来自uwsgi-nginx-flask-docker的Docker容器 [英] Use Tesseract 4 - Docker Container from uwsgi-nginx-flask-docker
问题描述
我的python项目在本地运行,并且可以运行.我将python中的tesseract与subprocess包一起使用.
然后我部署了项目,并且由于我使用了Flask,因此我安装了 tiangolo-uwsgi- flask-nginx-docker ,但是,此处未安装Tesseract.这就是为什么我的项目因为找不到tesseract而不再起作用.而且它无法识别在我的AWS实例上安装的tesseract,因为tesseract未安装在docker容器中.
这就是为什么我还要使用 tesseract 4 Docker 的原因Tesseract的装置.
我有两个Docker:
c82b61361992 tesseractshadow/tesseract4re:latest "/bin/bash" 6 seconds ago Up 5 seconds t4re
e122633ef81c my_project:latest "/entrypoint.sh /sta 35 minutes ago Up 35 minutes 0.0.0.0:80->80/tcp, 443/tcp modest_perlman
但是我不知道如何告诉my_project
它必须从Tesseract容器中取出Tesseract.
我阅读了这篇文章关于如何连接两个Docker容器,但是我迷路了. :)
我看到Tesseract Docker应该这样工作:
#!/bin/bash
docker ps -f name=t4re
TASK_TMP_DIR=TASK_$$_$(date +"%N")
echo "====== TASK $TASK_TMP_DIR started ======"
docker exec -it t4re mkdir \-p ./$TASK_TMP_DIR/
docker cp ./ocr-files/phototest.tif t4re:/home/work/$TASK_TMP_DIR/
docker exec -it t4re /bin/bash -c "mkdir -p ./$TASK_TMP_DIR/out/; cd ./$TASK_TMP_DIR/out/; tesseract ../phototest.tif phototest -l eng --psm 1 --oem 2 txt pdf hocr"
mkdir -p ./ocr-files/output/$TASK_TMP_DIR/
docker cp t4re:/home/work/$TASK_TMP_DIR/out/ ./ocr-files/output/$TASK_TMP_DIR/
docker exec -it t4re rm \-r ./$TASK_TMP_DIR/
docker exec -it t4re ls
echo "====== Result files was copied to ./ocr-files/output/$TASK_TMP_DIR/ ======"
但是我不知道如何在我的python脚本和其他容器中实现它.
我的python-tesseract脚本看起来非常类似于 pytesseract.py 我只是更改了几行,并删除了一些我不需要的东西.
也许有人知道该怎么做,或者可以提出另一种更好的方法来将tesseract与tiangolo-docker
编辑 (请参见下面的修改)
我找到了答案.由于它适用于每两个docker容器,因此我将编写一个可以始终使用的通用解决方案.
我在同一实例中同时拥有docker映像和容器:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
14524d364cff (image) "java -jar ..." 40 hours ago Up 40 hours 0.0.0.0:5000->5000/tcp api-1
3392994ae3ac (image) "java -jar ..." 40 hours ago Up 40 hours 0.0.0.0:5002->5002/tcp api-2
直到这里很容易.
然后,我写了一个docker-compose.yml
version: '2'
services:
api-1:
image: _name-of-image_
container_name: api-1
ports:
- "5000:5000"
depends_on:
- api-2
api-2:
image: _name-of-image_
container_name: api-2
ports:
- "5002:5002"
然后,例如,在api-1的docker文件中.
...
ENV API-2HOST api-2
...
就是这样.
在我的特定情况下,我有一个api-1.conf,其中::
accounts = {
http = {
host = "localhost"
host = ${?API-2HOST}
port = 5002
poolBufferSize = 100
routes = {
authentication = "/authentication"
login = "/login/"
logout = "/logout"
refreshTokens = "/refreshTokens"
}
}
}
,然后我可以轻松地在该处发出请求,因此两个docker容器也可以通信.
希望它可以帮助某人.
编辑
因为它可能很复杂,所以我创建了一个仅带有dockerfile的git项目,您可以在其中使用flask,nginx,uwsgi和tesseract.因此,无需同时使用两个容器.
docker-flask-nginx-uwsgi-tesseract >
I had my python project running local, and it works. I use tesseract from python with the subprocess package.
Then I deployed my project and since I use Flask, I installed tiangolo-uwsgi-flask-nginx-docker but, Tesseract isn't installed there. That's why my project doesn't work anymore because it cannot find tesseract. And it doesn't recognize the tesseract that is installed on my AWS instance because tesseract isn't installed in the docker container.
That's why I would like to use also tesseract 4 Docker which has an installation of Tesseract.
I have both Dockers:
c82b61361992 tesseractshadow/tesseract4re:latest "/bin/bash" 6 seconds ago Up 5 seconds t4re
e122633ef81c my_project:latest "/entrypoint.sh /sta 35 minutes ago Up 35 minutes 0.0.0.0:80->80/tcp, 443/tcp modest_perlman
But I don't know how to tell my_project
that it has to take Tesseract from the Tesseract Container.
I read this post about connecting two Docker containers, but I get even more lost. :)
I saw that the Tesseract Docker should work this way:
#!/bin/bash
docker ps -f name=t4re
TASK_TMP_DIR=TASK_$$_$(date +"%N")
echo "====== TASK $TASK_TMP_DIR started ======"
docker exec -it t4re mkdir \-p ./$TASK_TMP_DIR/
docker cp ./ocr-files/phototest.tif t4re:/home/work/$TASK_TMP_DIR/
docker exec -it t4re /bin/bash -c "mkdir -p ./$TASK_TMP_DIR/out/; cd ./$TASK_TMP_DIR/out/; tesseract ../phototest.tif phototest -l eng --psm 1 --oem 2 txt pdf hocr"
mkdir -p ./ocr-files/output/$TASK_TMP_DIR/
docker cp t4re:/home/work/$TASK_TMP_DIR/out/ ./ocr-files/output/$TASK_TMP_DIR/
docker exec -it t4re rm \-r ./$TASK_TMP_DIR/
docker exec -it t4re ls
echo "====== Result files was copied to ./ocr-files/output/$TASK_TMP_DIR/ ======"
But I've no clue, how to implement it in my python script and from the other container.
My python-tesseract script looks quite similar to pytesseract.py I just changed a few lines and deleted some stuff I don't need.
Maybe someone knows how to do this, or could propose another better way to use tesseract with the tiangolo-docker
EDIT (See the edit below)
I found the answer. Since it would work for every two docker containers, I'm gonna write a general solution which one can always use.
I have both docker images and containers in the same instance:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
14524d364cff (image) "java -jar ..." 40 hours ago Up 40 hours 0.0.0.0:5000->5000/tcp api-1
3392994ae3ac (image) "java -jar ..." 40 hours ago Up 40 hours 0.0.0.0:5002->5002/tcp api-2
Until here it's easy.
Then, I wrote a docker-compose.yml
version: '2'
services:
api-1:
image: _name-of-image_
container_name: api-1
ports:
- "5000:5000"
depends_on:
- api-2
api-2:
image: _name-of-image_
container_name: api-2
ports:
- "5002:5002"
Then, in the docker file of api-1, for example.
...
ENV API-2HOST api-2
...
and that's it.
In my particular case, I have an api-1.conf with:
accounts = {
http = {
host = "localhost"
host = ${?API-2HOST}
port = 5002
poolBufferSize = 100
routes = {
authentication = "/authentication"
login = "/login/"
logout = "/logout"
refreshTokens = "/refreshTokens"
}
}
}
and then I can easily make a request there and so are both docker containers communicated.
Hope it can help someone.
EDIT
Since it can be complicated, I created a git project with just a dockerfile where you can use flask, nginx, uwsgi and tesseract. So there's no need to use both containers.
docker-flask-nginx-uwsgi-tesseract
这篇关于使用Tesseract 4-来自uwsgi-nginx-flask-docker的Docker容器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!