使用OCR的PDF文本提取方法 [英] PDF Text Extraction Approach Using OCR

查看：343 发布时间：2020/5/25 4:32:57 java pdf text-parsing

本文介绍了使用OCR的PDF文本提取方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有人尝试使用OCR库和Java从PDF提取文本吗?您发现什么是最可靠的文本提取库.我见过的大多数方法(tesseract，GOCR)都是C库，需要编写一些JNI代码.

Has anybody attempted to extract text from a PDF using an OCR library and Java? What did you find to be the most reliable library for text extraction. Most of the approaches I've seen (tesseract, GOCR) are C libraries that would require some JNI code to be written.

我熟悉pdfbox，它现在是一个Apache孵化器项目，版本为0.8.x，但是它的文本提取并不总是准确的.我正在寻找一种更可靠的替代方法.

I'm familiar with pdfbox, which is now an Apache incubator project at version 0.8.x, but it's text extraction isn't always accurate. I'm looking for an alternative approach that is somewhat more reliable.

在尝试过程中，我还没有尝试过Asprise JavaPDF，但想进一步了解OCR方法(如果可能).

I've not tried Asprise JavaPDF yet, in the process of trying that, but wanted to know more about the OCR approach (if it's possible).

任何帮助将不胜感激.

Any help would be appreciated.

使用OCR的PDF文本提取方法 [英] PDF Text Extraction Approach Using OCR

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用OCR的PDF文本提取方法 [英] PDF Text Extraction Approach Using OCR

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭