如何在C#项目中实施和执行OCR? [英] How to implement and do OCR in a C# project?

查看:131
本文介绍了如何在C#项目中实施和执行OCR?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经搜索了一段时间,并且看到了一些OCR库请求.我想知道如何实现最简单,易于安装的OCR库,并使用详细信息将其安装到C#项目中.

I ve been searching for a while and all that i ve seen some OCR library requests. I would like to know how to implement the purest, easy to install and use OCR library with detailed info for installation into a C# project.

如果可能的话,我只是想像通常的dll引用一样实现它...

If posible, I just wanna implement it like a usual dll reference...

示例:

using org.pdfbox.pdmodel;
using org.pdfbox.util;

还有一些OCR代码示例将很不错,例如:

Also a little OCR code example would be nice, such as:

public string OCRFromBitmap(Bitmap Bmp)
{
    Bmp.Save(temppath, System.Drawing.Imaging.ImageFormat.Tiff);
    string OcrResult = Analyze(temppath);
    File.Delete(temppath);
    return OcrResult;
}

所以请考虑我对OCR项目不熟悉,并给我一个答案,就像和假人交谈一样.

So please consider that I'm not familiar to OCR projects and give me an answer like talking to a dummy.

我想人们误解了我的要求.我想知道如何将这些开源OCR库实现到C#项目中以及如何使用它们.给出为dup的链接根本没有提供我要求的答案.

I guess people misunderstood my request. I wanted to know how to implement those open source OCR libraries to a C# project and how to use them. The link given as dup is not giving answers that I requested at all.

推荐答案

如果有人对此进行了研究,我一直在尝试不同的选择,并且以下方法会产生很好的效果.以下是获取工作示例的步骤:

If anyone is looking into this, I've been trying different options and the following approach yields very good results. The following are the steps to get a working example:

  1. .tesseract的.NET包装器添加到您的项目中.可以通过NuGet软件包Install-Package Tesseract( https://github.com/charlesw/tesseract )添加.
  2. 转到官方下载部分. code.google.com/p/tesseract-ocr/">Tesseract 项目( https://code. google.com/p/tesseract-ocr/ 现在位于此处: https: //github.com/tesseract-ocr/langdata ).
  3. 下载首选语言数据,例如:tesseract-ocr-3.02.eng.tar.gz English language data for Tesseract 3.02.
  4. 在项目中创建tessdata目录,并将语言数据文件放入其中.
  5. 转到新添加文件的Properties并将其设置为在构建时复制.
  6. 添加对System.Drawing的引用.
  7. 从.NET Wrapper存储库的Samples目录中,将示例phototest.tif文件复制到您的项目目录中,并将其设置为在构建时进行复制.
  8. 在您的项目中创建以下两个文件(仅供入门):
  1. Add .NET Wrapper for tesseract to your project. It can be added via NuGet package Install-Package Tesseract(https://github.com/charlesw/tesseract).
  2. Go to the Downloads section of the official Tesseract project (https://code.google.com/p/tesseract-ocr/ It's now located here: https://github.com/tesseract-ocr/langdata).
  3. Download the preferred language data, example: tesseract-ocr-3.02.eng.tar.gz English language data for Tesseract 3.02.
  4. Create tessdata directory in your project and place the language data files in it.
  5. Go to Properties of the newly added files and set them to copy on build.
  6. Add a reference to System.Drawing.
  7. From .NET Wrapper repository, in the Samples directory copy the sample phototest.tif file into your project directory and set it to copy on build.
  8. Create the following two files in your project (just to get started):

Program.cs

using System;
using Tesseract;
using System.Diagnostics;

namespace ConsoleApplication
{
    class Program
    {
        public static void Main(string[] args)
        {
            var testImagePath = "./phototest.tif";
            if (args.Length > 0)
            {
                testImagePath = args[0];
            }

            try
            {
                var logger = new FormattedConsoleLogger();
                var resultPrinter = new ResultPrinter(logger);
                using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default))
                {
                    using (var img = Pix.LoadFromFile(testImagePath))
                    {
                        using (logger.Begin("Process image"))
                        {
                            var i = 1;
                            using (var page = engine.Process(img))
                            {
                                var text = page.GetText();
                                logger.Log("Text: {0}", text);
                                logger.Log("Mean confidence: {0}", page.GetMeanConfidence());

                                using (var iter = page.GetIterator())
                                {
                                    iter.Begin();
                                    do
                                    {
                                        if (i % 2 == 0)
                                        {
                                            using (logger.Begin("Line {0}", i))
                                            {
                                                do
                                                {
                                                    using (logger.Begin("Word Iteration"))
                                                    {
                                                        if (iter.IsAtBeginningOf(PageIteratorLevel.Block))
                                                        {
                                                            logger.Log("New block");
                                                        }
                                                        if (iter.IsAtBeginningOf(PageIteratorLevel.Para))
                                                        {
                                                            logger.Log("New paragraph");
                                                        }
                                                        if (iter.IsAtBeginningOf(PageIteratorLevel.TextLine))
                                                        {
                                                            logger.Log("New line");
                                                        }
                                                        logger.Log("word: " + iter.GetText(PageIteratorLevel.Word));
                                                    }
                                                } while (iter.Next(PageIteratorLevel.TextLine, PageIteratorLevel.Word));
                                            }
                                        }
                                        i++;
                                    } while (iter.Next(PageIteratorLevel.Para, PageIteratorLevel.TextLine));
                                }
                            }
                        }
                    }
                }
            }
            catch (Exception e)
            {
                Trace.TraceError(e.ToString());
                Console.WriteLine("Unexpected Error: " + e.Message);
                Console.WriteLine("Details: ");
                Console.WriteLine(e.ToString());
            }
            Console.Write("Press any key to continue . . . ");
            Console.ReadKey(true);
        }



        private class ResultPrinter
        {
            readonly FormattedConsoleLogger logger;

            public ResultPrinter(FormattedConsoleLogger logger)
            {
                this.logger = logger;
            }

            public void Print(ResultIterator iter)
            {
                logger.Log("Is beginning of block: {0}", iter.IsAtBeginningOf(PageIteratorLevel.Block));
                logger.Log("Is beginning of para: {0}", iter.IsAtBeginningOf(PageIteratorLevel.Para));
                logger.Log("Is beginning of text line: {0}", iter.IsAtBeginningOf(PageIteratorLevel.TextLine));
                logger.Log("Is beginning of word: {0}", iter.IsAtBeginningOf(PageIteratorLevel.Word));
                logger.Log("Is beginning of symbol: {0}", iter.IsAtBeginningOf(PageIteratorLevel.Symbol));

                logger.Log("Block text: \"{0}\"", iter.GetText(PageIteratorLevel.Block));
                logger.Log("Para text: \"{0}\"", iter.GetText(PageIteratorLevel.Para));
                logger.Log("TextLine text: \"{0}\"", iter.GetText(PageIteratorLevel.TextLine));
                logger.Log("Word text: \"{0}\"", iter.GetText(PageIteratorLevel.Word));
                logger.Log("Symbol text: \"{0}\"", iter.GetText(PageIteratorLevel.Symbol));
            }
        }
    }
}

FormattedConsoleLogger.cs

using System;
using System.Collections.Generic;
using System.Text;
using Tesseract;

namespace ConsoleApplication
{
    public class FormattedConsoleLogger
    {
        const string Tab = "    ";
        private class Scope : DisposableBase
        {
            private int indentLevel;
            private string indent;
            private FormattedConsoleLogger container;

            public Scope(FormattedConsoleLogger container, int indentLevel)
            {
                this.container = container;
                this.indentLevel = indentLevel;
                StringBuilder indent = new StringBuilder();
                for (int i = 0; i < indentLevel; i++)
                {
                    indent.Append(Tab);
                }
                this.indent = indent.ToString();
            }

            public void Log(string format, object[] args)
            {
                var message = String.Format(format, args);
                StringBuilder indentedMessage = new StringBuilder(message.Length + indent.Length * 10);
                int i = 0;
                bool isNewLine = true;
                while (i < message.Length)
                {
                    if (message.Length > i && message[i] == '\r' && message[i + 1] == '\n')
                    {
                        indentedMessage.AppendLine();
                        isNewLine = true;
                        i += 2;
                    }
                    else if (message[i] == '\r' || message[i] == '\n')
                    {
                        indentedMessage.AppendLine();
                        isNewLine = true;
                        i++;
                    }
                    else
                    {
                        if (isNewLine)
                        {
                            indentedMessage.Append(indent);
                            isNewLine = false;
                        }
                        indentedMessage.Append(message[i]);
                        i++;
                    }
                }

                Console.WriteLine(indentedMessage.ToString());

            }

            public Scope Begin()
            {
                return new Scope(container, indentLevel + 1);
            }

            protected override void Dispose(bool disposing)
            {
                if (disposing)
                {
                    var scope = container.scopes.Pop();
                    if (scope != this)
                    {
                        throw new InvalidOperationException("Format scope removed out of order.");
                    }
                }
            }
        }

        private Stack<Scope> scopes = new Stack<Scope>();

        public IDisposable Begin(string title = "", params object[] args)
        {
            Log(title, args);
            Scope scope;
            if (scopes.Count == 0)
            {
                scope = new Scope(this, 1);
            }
            else
            {
                scope = ActiveScope.Begin();
            }
            scopes.Push(scope);
            return scope;
        }

        public void Log(string format, params object[] args)
        {
            if (scopes.Count > 0)
            {
                ActiveScope.Log(format, args);
            }
            else
            {
                Console.WriteLine(String.Format(format, args));
            }
        }

        private Scope ActiveScope
        {
            get
            {
                var top = scopes.Peek();
                if (top == null) throw new InvalidOperationException("No current scope");
                return top;
            }
        }
    }
}

这篇关于如何在C#项目中实施和执行OCR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆