NET的开放源代码OCR [英] open source ocr for .NET

查看:317
本文介绍了NET的开放源代码OCR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

.NET是否有任何开放源代码的OCR,即使文本使用不同的字体,也可以从扫描的pdf中提取文本,并且可以将其呈现为html(或xml或text)格式.

Is there any open source OCR of .NET that can extract text from scanned pdf even if the text is in different fonts and it gives the ability to render it in html( or xml or text)format.

推荐答案

使用此链接:
OCR [ OCR来源代码 [
Use this links:
OCR[^]
OCR Source code[^]


不要将自己局限于.NET

多年来,OCR一直是一个已解决的问题-早在.NET出现之前,开放源代码项目就倾向于使用非专有语言.

我是该团队的成员之一,该团队于1988年为PC生产了第一批在商业上成功的OCR产品之一.我希望大多数开源OCR项目都在90年代初期开始.

可能有非常好的开源解决方案-最有可能是C ++.

如果您选择可用的最佳质量的OCR,然后进行工作以与之交互,那么您会感到很高兴–而不是为次要的OCR解决问题,因为后者很容易合并到您的项目中.

快速搜索即可找到该项目:

http://code.google.com/p/tesseract-ocr/ [
Don''t limit yourself to .NET

OCR has been a solved problem for years -- well before .NET came out, and open source projects tend to use non-proprietary languages.

I was part of the team that produced one of the first comercially successful OCR products for the PC in 1988. I would expect that most open source OCR projects were started in the early 90''s.

There are probably very good open source solutions out there -- most likely in C++.

You are going to be a lot happier if you select the best quality OCR available and then do the work to interface to it -- rather than settling for inferior OCR that''s easy to incorporate in your project.

A quick search turns up this project:

http://code.google.com/p/tesseract-ocr/[^]

Apparently it was pretty accurate back in 1995 and Google has adopted it and done a lot of work on it since 2006.

It''s already ported to Windows and VS2008/2010 -- so all you have to do is interface your .NET code with it.


这篇关于NET的开放源代码OCR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆