C#代码从扫描的pdf文档中提取文本? [英] C# code to extract text from a scanned pdf document ?

查看:104
本文介绍了C#代码从扫描的pdf文档中提取文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以指导我从扫描的pdf文档中提取文本的一些c#代码示例吗?我经历了这么多帖子,但找不到合适的帖子,我可以理解如何做到这一点。那些使用的库不是免费的。有些库有限制,例如只能从pdf文档中提取前三页。要提取整个文档,它会要求我下载它们的完整版本库。所以完整版不是免费的。

如果不花钱,请指导我如何做到这一点。

Can anyone direct me to some c# code examples for extracting text from a scanned pdf document? I've went through with so many posts, but couldn't find a proper one where i can understand how to do this. Those libraries that were used are not free ones. Some libraries has restrictions like only able to extract first three pages from a pdf document. To extract whole document it asks me to download their full version of the library. So the full version is not for free.
Please direct me how to do this without spending money.

推荐答案




请参考以下网址



http://www.codeproject.com/Questions/243295/Is-这可以从文件中提取文本
Hi,

Please refer the following URL

http://www.codeproject.com/Questions/243295/Is-this-possible-to-Extract-Text-from-Scanned-PDF

你可以使用tesseract OCR .net https://code.google.com/p/tesseractdotnet/ [ ^ ]
You can use tesseract OCR .net https://code.google.com/p/tesseractdotnet/[^]


这篇关于C#代码从扫描的pdf文档中提取文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆