使用OCR从Image文件中读取文本的API [英] API to read text from Image file using OCR

查看：286 发布时间：2018/12/10 22:32:48 java ocr

本文介绍了使用OCR从Image文件中读取文本的API的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找Java中OCR（光学字符识别）的示例代码或API名称，使用它我可以从图像文件中提取所有文本。不将它与我在代码下面使用的任何图像进行比较。

I am looking out for an example code or API name from OCR (Optical character recognition) in Java using which I can extract all text present from an image file. Without comparing it with any image which I am doing using below code.

public class OCRTest {

    static String STR = "";

    public static void main(String[] args) {
        OCR l = new OCR(0.70f);
        l.loadFontsDirectory(OCRTest.class, new File("fonts"));
        l.loadFont(OCRTest.class, new File("fonts", "font_1"));
        ImageBinaryGrey i = new ImageBinaryGrey(Capture.load(OCRTest.class, "full.png"));
        STR = l.recognize(i, 1285, 654, 1343, 677, "font_1");
        System.out.println(STR);
    }
}

推荐答案

你可以尝试 Tess4j 或 Tesseract的JavaCPP预设。我后来认为它比前者更容易。
将依赖项添加到您的pom中`

You can try Tess4j or JavaCPP Presets for Tesseract. I perfer later as its easier than the former. Add the dependency to your pom `

        <dependency>
            <groupId>org.bytedeco.javacpp-presets</groupId>
            <artifactId>tesseract-platform</artifactId>
            <version>3.04.01-1.3</version>
        </dependency>

`
及其简单代码

` And its simple to code

import org.bytedeco.javacpp.*;
import static org.bytedeco.javacpp.lept.*;
import static org.bytedeco.javacpp.tesseract.*;

public class BasicExample {
    public static void main(String[] args) {
        BytePointer outText;

        TessBaseAPI api = new TessBaseAPI();
        // Initialize tesseract-ocr with English, without specifying tessdata path
        if (api.Init(null, "eng") != 0) {
            System.err.println("Could not initialize tesseract.");
            System.exit(1);
        }

        // Open input image with leptonica library
        PIX image = pixRead(args.length > 0 ? args[0] : "/usr/src/tesseract/testing/phototest.tif");
        api.SetImage(image);
        // Get OCR result
        outText = api.GetUTF8Text();
        System.out.println("OCR output:\n" + outText.getString());

        // Destroy used object and release memory
        api.End();
        outText.deallocate();
        pixDestroy(image);
    }
}

Tess4j有点复杂，因为它需要特定的VC ++可再发行组件包要安装。

Tess4j is little complex as its requires specific VC++ redistributable package to be installed.

这篇关于使用OCR从Image文件中读取文本的API的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用OCR从Image文件中读取文本的API [英] API to read text from Image file using OCR

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用OCR从Image文件中读取文本的API [英] API to read text from Image file using OCR

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭