Tesseract OCR 不适用于 64 位机器 [英] Tesseract OCR not working for 64 bit machine

查看:41
本文介绍了Tesseract OCR 不适用于 64 位机器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个使用 Tesseract 进行 OCR 的应用程序.

I am working on an application in which I am using Tesseract for OCR.

我的代码在 Windows 32 位系统中运行良好.但是当我尝试使用 32 位 .dll 文件在 64 位机器上运行相同的代码时,代码正在运行,但代码没有给出准确的结果.

My code is working absolutely fine in windows 32 bit system. But when I try to run the same code in 64 bit machine using the 32 bit .dll files, the code is running but then the code is not giving the accurate results.

所以我使用 64 位 .dll 文件在 64 位机器上运行它.现在,当我尝试运行相同的程序时,在控制台(Eclipse Kepler)中出现以下错误.

So I am running it in 64 bit machine using the 64 bit .dll files. Now when I tried to run the same program, I got the following error in Console(Eclipse Kepler).

Exception in thread "AWT-EventQueue-0" java.lang.UnsatisfiedLinkError: %1 is not a                                                           
valid Win32 application.
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.Native.open(Native.java:1759)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:260)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:398)
at com.sun.jna.Library$Handler.<init>(Library.java:147)
at com.sun.jna.Native.loadLibrary(Native.java:412)
at com.sun.jna.Native.loadLibrary(Native.java:391)
at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:38)
at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:293)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:227)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:176)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:159)

我已经下载了 64 位 .dll 文件(https://github.com/charlesw/tesseract/tree/master/src/lib/TesseractOcr/x64) 与 64 位系统兼容,但我仍然遇到相同的错误.我在 64 位机器上使用 GhostScript v-8.71.我已经在 Program Files 和 Program Files(x86) 中安装了它.我也相应地设置了环境变量.但仍然无法正常工作.

I have downloaded the 64 bit .dll files (https://github.com/charlesw/tesseract/tree/master/src/lib/TesseractOcr/x64) compatible with 64 bit system but still i am getting the same error. I am using GhostScript v-8.71 on 64 bit machine. I have installed this in both Program Files and Program Files(x86). I have also set the environment variables accordingly.But still not working.

请给我一些解决方案!

推荐答案

Tess4J 目前仅支持 32 位 JVM

这是创作者 nguyenq 对类似问题的回应在 sourceforge 论坛上.

同样,在教程中指出发行版中仅包含 32 位 DLL.

Similarly, in the tutorial it points out that only 32-bit DLL's are included in the distro.

要在 64 位 JVM 上运行,您需要使用 Tesseract 和 Leptonica 64 位 DLL.

一种解决方案:告诉您的 IDE 改用 32 位 JVM.

--缺点是您可能会混合 32 位和 64 位环境,在复杂的应用程序或环境中这可能很奇怪...(我认为这不是太糟糕,但可能会在您的 IDE 中造成痛苦)

-- downside is that you may be mixing 32 bit and 64 bit environments, in a complicated app or env this could be odd... (I don't think it's too bad, but might be a pain in your IDE)

在此处找到的更新,您似乎可以找到64 位 Java 的 DLL 此处,作为.NET 的 Tesseract 包装器(奇怪的是).然而,我还没有尝试过那些 64 位 DDL,在 sourceforge 链接中,它说它们依赖于 Visual C++ Redistributable for VS2012 或 Visual C++ Redistributable for VS2013 ... 糟透了....

In an update found here, it seems you can find DLL's for 64-bit Java here, as part of the Tesseract wrapper for .NET (oddly enough). However, I haven't tried out those 64-bit DDL's yet and in the sourceforge link, it says they depend on the Visual C++ Redistributable for VS2012 or Visual C++ Redistributable for VS2013 ... which sucks....

如果我找到更简洁的解决方案,我会更新这篇文章.

I'll update this post if I figure out a cleaner solution.

更新

请注意,我是使用 Amazon Web Services 实例完成此操作的.

Note that I did this working with Amazon Web Services instances.

我能够让 Tess4J 在 64 位 Ubuntu 14.04 上工作.当我放弃 Red Hat 发行版并转到 Ubuntu 时,这实际上非常简单.

I was able to get Tess4J to work on a 64-bit Ubuntu 14.04. It was actually very simple once I gave up on my Red Hat distro and went to Ubuntu.

sudo apt-get install tesseract-ocr 将完全设置tesseract.您可以通过键入 tesseract -v 进行检查.我还需要 GhostScript,因为我正在处理 PDF.sudo apt-get install ghostscript 再次设置好一切.使用 gs -v 验证.

sudo apt-get install tesseract-ocr will get tesseract set up completely. You can check by typing tesseract -v. I also needed GhostScript because I was working with PDF's. sudo apt-get install ghostscript again got everything set up. Verify with gs -v.

现在在您的 Java 应用程序中,所有您需要在路径中包含来自 Tess4J 下载的 JAR -- jna-4.1.0j.ar, jai_imageio.jartess4j.jarghost4j-0.5.1.jar(如果您使用的是 PDF).

Now in your Java app, all you need to include are the JAR's from Tess4J's download in your path -- jna-4.1.0j.ar, jai_imageio.jar, tess4j.jar, and ghost4j-0.5.1.jar if you are working with PDF.

在您的 Java 应用程序中,您需要设置数据路径,以便您的 Tesseract 实例知道 tesseract 的安装位置.即使我设置了环境变量,它也从未对我有用.我需要像这样显式设置数据路径:

In your Java app, you need to set the data path so your Tesseract instance knows where tesseract is installed. Even while I had the environment variable set, it never worked for me. I needed to explicitly set the data path like so:

Tesseract tessInstance = Tesseract.getInstance();
tessInstance.setDatapath(System.getenv("TESSDATA_PREFIX"));
ImageIO.scanForPlugins(); // make sure it knows about GhostScript, to work with PDFs
String result = tessInstance.doOCR(myFile);

确保 setDatapath() 设置为 tesseract 安装的 tessdata 文件夹的 parent 文件夹(在我的 Ubuntu 上,这是/usr/share/tesseract-ocr/`).

Be sure that setDatapath() sets to the parent folder of the tessdata folder of your tesseract installation (on my Ubuntu this was /usr/share/tesseract-ocr/`).

这就是我所需要的.不用担心类路径中的 DLL.

That was all I needed. No worrying about DLL's in class path.

tl;博士:

使用最新的 Ubuntu

Use up-to-date Ubuntu

sudo apt-get tesseract-ocr

sudo apt-get ghostscript 如果使用 PDF

包括正确的 Tess4J JAR(jna-4.1.0j.arjai_imageio.jartess4j.jarghost4j-0.5.1.jar(如果您使用的是 PDF)

include proper Tess4J JAR's (jna-4.1.0j.ar, jai_imageio.jar, tess4j.jar, and ghost4j-0.5.1.jar if you are working with PDF)

调用 tess.setDataPath() 指向您的 tesseract 安装(/usr/share/tesseract-ocr/ 对于我的 Ubuntu 14.04)

call tess.setDataPath() to point to your tesseract installation (/usr/share/tesseract-ocr/ for my Ubuntu 14.04)

ImageIO.scanForPlugins() 如果使用 GhostScript

ImageIO.scanForPlugins() if using GhostScript

就是这样.你很高兴去调用tess.doOCR(MyFile)

That's it. You are good to go call tess.doOCR(MyFile) happily

这篇关于Tesseract OCR 不适用于 64 位机器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆