Tesseract OCR不适用于64位计算机 [英] Tesseract OCR not working for 64 bit machine

查看:184
本文介绍了Tesseract OCR不适用于64位计算机的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发将Tesseract用于OCR的应用程序。



我的代码在Windows 32位系统中运行正常。
但是,当我尝试使用32位.dll文件在64位计算机中运行相同的代码时,该代码正在运行,但是该代码未给出准确的结果。



所以我正在使用64位.dll文件在64位计算机上运行它。
现在,当我尝试运行相同的程序时,在Console(Eclipse Kepler)中收到以下错误。

 线程 AWT-EventQueue-0中的异常java.lang.UnsatisfiedLinkError:%1不是
有效的Win32应用程序。
在com.sun.jna.Native.open(本机方法)
在com.sun.jna.Native.open(本机.java:1759)
在com.sun.jna。 < init>在com.sun.jna上的NativeLibrary.loadLibrary(NativeLibrary.java:260)
。在com.sun.jna.Library $ Handler上的NativeLibrary.getInstance(NativeLibrary.java:398)
。 (Library.java:147)com.sun.jna.Native.loadLibrary(Native.java:412)
com.sun.jna.Native.loadLibrary(Native.java:391)
b $ b在net.sourceforge.tess4j.TessAPI。(clinit)(TessAPI.java:38)
在net.sourceforge.tess4j.Tesseract.init(Tesseract.java:293)
在网络.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:227)
在net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:176)
在net.sourceforge.tess4j.Tesseract.doOCR (Tesseract.java:159)

我已经下载了64位.dll文件( https://github.com/charlesw/tesseract/tree/master/src/lib / TesseractOcr / x64 )与64位系统兼容,但我仍然遇到相同的错误。
我在64位计算机上使用GhostScript v-8.71。我已经在Program Files和Program Files(x86)中都安装了它。我还相应地设置了环境变量。但是仍然无法正常工作。



请为我提供一些解决方案!

解决方案

Tess4J当前仅支持32位JVM



这是创建者nguyenq,在sourceforge论坛上回应了类似的问题。



同样,在本教程中,它指出发行版中仅包含32位DLL。



要与64位JVM一起运行,您需要使用Tesseract和Leptonica 64位DLL。



一种解决方案:告诉您的IDE改用32位JVM。



-缺点是您可能会在复杂的应用程序或环境中混合使用32位和64位环境,这可能很奇怪...(我认为这并不算太糟,但可能会给您的IDE带来麻烦)



在此处找到更新 ,看来您可以找到64位Java的DLL 此处,作为.NET的Tesseract包装器的一部分 (足够了)。 但是,我还没有尝试过那些64位DDL,在sourceforge链接中,它表示它们取决于VS2012的Visual C ++可再发行组件或VS2013的Visual C ++可再发行组件。 ...糟透了。...



如果我想出一个更清洁的解决方案,我会更新此帖子。



更新



请注意,我是使用Amazon Web Services实例进行此操作的。



我能够使Tess4J在64位Ubuntu 14.04上工作。一旦我放弃了Red Hat发行版并转到Ubuntu,这实际上非常简单。



sudo apt-get install tesseract-ocr 将完全设置tesseract。您可以通过输入 tesseract -v 进行检查。我还需要GhostScript,因为我正在使用PDF。 sudo apt-get install ghostscript 再次设置了所有内容。使用 gs -v 进行验证。



现在在Java应用中,您需要的 all 包括您路径中Tess4J的下载中的JAR- jna-4.1.0j.ar jai_imageio.jar tess4j.jar ghost4j-0.5.1.jar (如果您使用的是PDF)。



在Java应用程序中,您需要设置数据路径,以便您的Tesseract实例知道tesseract的安装位置。即使设置了环境变量,它也对我没有用。我需要像这样显式设置数据路径:

  Tesseract tessInstance = Tesseract.getInstance(); 
tessInstance.setDatapath(System.getenv( TESSDATA_PREFIX));
ImageIO.scanForPlugins(); //确保知道有关GhostScript的知识,以使用PDF
字符串result = tessInstance.doOCR(myFile);

请确保 setDatapath()设置为您的tesseract安装的 tessdata 文件夹的 parent 文件夹(在我的Ubuntu上是/ usr / share / tesseract-ocr /`)。 / p>

这就是我所需要的。不用担心DLL在类路径中。



tl; dr:



使用最新的Ubuntu



sudo apt-get tesseract-ocr



sudo apt-get ghostscript 如果使用PDF



包括适当的Tess4J JAR( jna-4.1.0j.ar jai_imageio.jar tess4j.jar ghost4j-0.5.1.jar (如果您使用的是PDF)



调用 tess.setDataPath()指向您的tesseract安装(对于我的Ubuntu 14.04,为 / usr / share / tesseract-ocr /



ImageIO.scanForPlugins()(如果使用GhostScript)



而已。您可以高兴地致电 tess.doOCR(MyFile)


I am working on an application in which I am using Tesseract for OCR.

My code is working absolutely fine in windows 32 bit system. But when I try to run the same code in 64 bit machine using the 32 bit .dll files, the code is running but then the code is not giving the accurate results.

So I am running it in 64 bit machine using the 64 bit .dll files. Now when I tried to run the same program, I got the following error in Console(Eclipse Kepler).

Exception in thread "AWT-EventQueue-0" java.lang.UnsatisfiedLinkError: %1 is not a                                                           
valid Win32 application.
at com.sun.jna.Native.open(Native Method)
at com.sun.jna.Native.open(Native.java:1759)
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:260)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:398)
at com.sun.jna.Library$Handler.<init>(Library.java:147)
at com.sun.jna.Native.loadLibrary(Native.java:412)
at com.sun.jna.Native.loadLibrary(Native.java:391)
at net.sourceforge.tess4j.TessAPI.<clinit>(TessAPI.java:38)
at net.sourceforge.tess4j.Tesseract.init(Tesseract.java:293)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:227)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:176)
at net.sourceforge.tess4j.Tesseract.doOCR(Tesseract.java:159)

I have downloaded the 64 bit .dll files (https://github.com/charlesw/tesseract/tree/master/src/lib/TesseractOcr/x64) compatible with 64 bit system but still i am getting the same error. I am using GhostScript v-8.71 on 64 bit machine. I have installed this in both Program Files and Program Files(x86). I have also set the environment variables accordingly.But still not working.

Please please provide me with some solution!

解决方案

Tess4J only currently supports 32-bit JVM

This is the creator, nguyenq, responding to a similar issue on a sourceforge forum.

Similarly, in the tutorial it points out that only 32-bit DLL's are included in the distro.

To run with a JVM 64-bit, you'll need to use Tesseract and Leptonica 64-bit DLLs.

One solution: Tell your IDEto use a 32-bit JVM instead.

-- downside is that you may be mixing 32 bit and 64 bit environments, in a complicated app or env this could be odd... (I don't think it's too bad, but might be a pain in your IDE)

In an update found here, it seems you can find DLL's for 64-bit Java here, as part of the Tesseract wrapper for .NET (oddly enough). However, I haven't tried out those 64-bit DDL's yet and in the sourceforge link, it says they depend on the Visual C++ Redistributable for VS2012 or Visual C++ Redistributable for VS2013 ... which sucks....

I'll update this post if I figure out a cleaner solution.

UPDATE

Note that I did this working with Amazon Web Services instances.

I was able to get Tess4J to work on a 64-bit Ubuntu 14.04. It was actually very simple once I gave up on my Red Hat distro and went to Ubuntu.

sudo apt-get install tesseract-ocr will get tesseract set up completely. You can check by typing tesseract -v. I also needed GhostScript because I was working with PDF's. sudo apt-get install ghostscript again got everything set up. Verify with gs -v.

Now in your Java app, all you need to include are the JAR's from Tess4J's download in your path -- jna-4.1.0j.ar, jai_imageio.jar, tess4j.jar, and ghost4j-0.5.1.jar if you are working with PDF.

In your Java app, you need to set the data path so your Tesseract instance knows where tesseract is installed. Even while I had the environment variable set, it never worked for me. I needed to explicitly set the data path like so:

Tesseract tessInstance = Tesseract.getInstance();
tessInstance.setDatapath(System.getenv("TESSDATA_PREFIX"));
ImageIO.scanForPlugins(); // make sure it knows about GhostScript, to work with PDFs
String result = tessInstance.doOCR(myFile);

Be sure that setDatapath() sets to the parent folder of the tessdata folder of your tesseract installation (on my Ubuntu this was /usr/share/tesseract-ocr/`).

That was all I needed. No worrying about DLL's in class path.

tl;dr:

Use up-to-date Ubuntu

sudo apt-get tesseract-ocr

sudo apt-get ghostscript if working with PDF

include proper Tess4J JAR's (jna-4.1.0j.ar, jai_imageio.jar, tess4j.jar, and ghost4j-0.5.1.jar if you are working with PDF)

call tess.setDataPath() to point to your tesseract installation (/usr/share/tesseract-ocr/ for my Ubuntu 14.04)

ImageIO.scanForPlugins() if using GhostScript

That's it. You are good to go call tess.doOCR(MyFile) happily

这篇关于Tesseract OCR不适用于64位计算机的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆