是否可以从asp.net中的扫描的PDF文档和图像中提取文本 [英] Is this possible to Extract Text from Scanned PDF Documents and Images in asp.net

查看:65
本文介绍了是否可以从asp.net中的扫描的PDF文档和图像中提取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以从asp.net中的扫描的PDF文档和图像中提取文本?如果是,请帮助我

Is this possible to Extract Text from Scanned PDF Documents and Images in asp.net? If yes Please help me

推荐答案

,以从pdf中获取文本,并查看ITextSharp,请查看 ^ ]

要从图像中提取文本,您需要进行OCR,请查看 google tesseract [跟踪 [ ^ ]
to get text from a pdf look into ITextSharp have a look at Converting PDF to Text in C#[^]

to extract text from an image you need to do OCR have a look at google tesseract[^] Traceract[^]


是同意Simon在解决方案1中的评论.此要求与 OCR -光学字符识别"有关.

您可以在Microsoft的SDK以下使用此功能.

Microsoft Office Document Imaging-
http://social.technet.microsoft.com /Forums/zh-CN/officeappcompat/thread/93d6f285-dc98-46e2-b7e0-872bba9c4e35/ [
Yes Agree with Simon''s comments in Solution-1. This requirement is related to OCR - "Optical Character Recognition".

You can use below Microsoft''s SDK for this.

Microsoft Office Document Imaging -
http://social.technet.microsoft.com/Forums/en-US/officeappcompat/thread/93d6f285-dc98-46e2-b7e0-872bba9c4e35/[^]

I had evaluated several Third Party OCR SDK''s in one of my assignment. In case if you are open for Third Party OCR SDK then search below SDK''s on Google.

1) Nuance OmniPage OCR
2) Accusoft SmartZone OCR


我同意Simon的观点:如果PDF是扫描图像,则需要OCR,这并不容易.

请在过去的解决方案中查看我对OCR的建议
OCR软件 [
I agree with Simon: if the PDF is a scanned image, you will need OCR, which is not easy.

Please see my advice on OCR in my past solution OCR Software[^].

Good luck,
—SA


这篇关于是否可以从asp.net中的扫描的PDF文档和图像中提取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆