如何在没有Visual Studio的Windows上安装Leptonica + tesseract以在Anaconda中使用? [英] How to install leptonica+tesseract on Windows without Visual Studio to use in Anaconda?
问题描述
我想从图像中执行文本识别,我想使用Python.我安装了Anaconda.现在,我想安装Tesseract,但我还需要安装Leptonica.我没有找到任何明确的说明如何在Windows中执行此操作.对于Leptonica,我不想安装Visual Studio. 因此,有人可以提供明确的说明,如何在不使用Visual Studio的Anaconda中在Windows上安装leptonica和tesseract的情况下吗? 谢谢.
I wanted to perform text recognition from images and I want to use Python. I installed Anaconda. Now I want to install Tesseract but I also need to install Leptonica. I did not find any clear instruction how to do it in windows. For Leptonica I do not want to install Visual Studio. So could anybody provide clear instructions how to install leptonica and tesseract on Windows without Visual Studio to use in anaconda ? Thanks.
推荐答案
以下是从2016年4月22日起使tesseract 3.05开发人员版本在Windows 7和Windows 8机器上均可运行的简单步骤:
Here is simple set of steps to have tesseract 3.05 dev version as of 04/22/2016 working both on windows 7 and windows 8 machines:
1-从tesseract-ocr官方页面的可执行文件中安装tesseract(仅适用于Windoes的3.02版)
1- install tesseract from its executable from official tesseract-ocr page (version 3.02 for windoes will suffice)
2-从 http://domasofan.spdns.eu下载tesseract 3.05开发版本的以下两个文件/tesseract/
有2个exe文件:
- tesseract-core-yyyymmdd.exe 没有语言数据的Tesseract核心应用程序
- tesseract-langs-yyyymmdd.exe 所有适用于Tesseract的语言数据.
- tesseract-core-yyyymmdd.exe Tesseract core application without language data
- tesseract-langs-yyyymmdd.exe All the language data available for Tesseract.
(yyyymmdd表示年4位数字,月2位数字和日2位数字.)
(yyyymmdd means year 4 digits, month 2 digits and day 2 digits.)
该应用程序是便携式的,因此您可以将其安装在USB记忆棒上或其他位置.
The app is portable so you can install it on a USB stick or in another location.
用于安装这些软件的子步骤:
sub Steps to install these:
- 下载tesseract-core和tesseract-langs软件包.
- 双击tesseract-core软件包并将其解压缩到您想要的目录(名为"Tess_temp"的临时新文件夹).
-
双击tesseract-langs软件包并将其解压缩到同一目录,但在上面的"Tess_temp"文件夹中将\ tessdata添加到其中. 例如,如果我将tesseract-core提取到c:\ Tess_temp,则tesseract-langs需要转到c:\ Tess_temp \ tessdata.
- Download the tesseract-core and tesseract-langs packages.
- Double click the tesseract-core package and extract it to a directory where you want it to be (a temporary new folder called "Tess_temp").
Double click the tesseract-langs package and extract it to the same directory but add \tessdata to it in the above "Tess_temp" folder. For example if i would have extracted tesseract-core to c:\Tess_temp, tesseract-langs needs to go to c:\Tess_temp\tessdata.
现在将"Tess_temp"中的内容复制到上述步骤1中安装了tesseract 3.02的位置(通常在C:\ Program Files(x86)\ Tesseract-OCR中)(用3.05替换3.02材料)
Now copy what ever you have in "Tess_temp" to where tesseract 3.02 was installed in step 1 above (its usially in C:\Program Files (x86)\Tesseract-OCR) (replace 3.02 materials with 3.05 )
它现在应该可以在Windows上的3.05版本中使用. 将样本图像test.png(带有文本)复制到此tesseract-ocr文件夹中,然后打开一个cmd并键入以下命令:
It should work now with the 3.05 version on windows. copy a sample image test.png (with text) to this tesseract-ocr folder and open a cmd and type in the following commands:
转到tesseract文件夹:cd C:\Program Files <x86>\Tesseract-OCR
go to tesseract folder: cd C:\Program Files <x86>\Tesseract-OCR
在test.png上运行tesseract:tesseract -l eng test.png test_text -psm 6
run tesseract on test.png: tesseract -l eng test.png test_text -psm 6
它将显示给您
Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
恭喜! (检查test_txt.txt中提取的文本)
congratulations ! (check test_txt.txt for the extracted text)
这篇关于如何在没有Visual Studio的Windows上安装Leptonica + tesseract以在Anaconda中使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!