将图像转换为可搜索的pdf [英] Convert image to searchable pdf

查看:243
本文介绍了将图像转换为可搜索的pdf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找可以将tiff图像转换为可搜索pdf(OCR)的开源Java API.我到处都有研究,但到目前为止什么都没发现.

Hi I am looking for a open-source java API that can convert tiff image to searchable pdf (OCR). I have research around but found nothing so far.

注意 我看了这篇文章,但是此API并未将图像转换为pdf Java OCR实现.但是,我仍然在玩一些代码.

NOTE I have looked at this post but this API does not convert the image to pdf Java OCR implementation. However, I am still playing with the code a bit.

推荐答案

您可以使用iText将图像转换为PDF.这里的难点是执行OCR,而不是创建PDF.

You can convert images to PDF using iText. The hard thing here is doing the OCR, not creating the PDF.

我会警告您:任何值得使用的OCR引擎都会使您花费大量金钱.自由和/或开放源代码通常是宠物项目,是某种算法或另一种算法的概念证明.不适合现实世界中的OCR应用. Tesseract可能是同类产品中最好的,但即使如此,其精确度也远不及商用发动机.

I will warn you: any OCR engine that is worth using is going to cost you a significant amount of money. Free and/or open source ones are generally pet projects, proof of concept for some algorithm or another. Not suitable for real world OCR applications. Tesseract is probably the best of the bunch, but even that has accuracies that are far, far worse than commercial engines.

我们有一个商业OCR应用程序,在评估引擎时我一直走这条路-建议您咬紧牙关,与引擎提供商联系并获取报价:Abbyy(最佳准确性,最昂贵,较慢),Expervision(快速,不准确,在道路价格的中间),Nuance(在道路速度,准确性和价格的中间).这些都不是用Java编写的,因此您应该计划一些时间来围绕它们的API开发JNI代码.

We have a commercial OCR application, and I've been down this path while evaluating engines - I'd suggest that you bite the bullet and reach out to the engine providers and get quotes: Abbyy (best accuracy, most expensive, slower), Expervision (fast, not as accurate, middle of the road price), Nuance (middle of the road speed, accuracy and price). None of these will be written in Java, so you should plan some time to develop JNI code around their APIs.

祝你好运-这是一个很大的项目!

Good luck - it's a big project!

这篇关于将图像转换为可搜索的pdf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆