Windows OCR引擎无法识别画布中的文本(转换为位图) [英] Windows OCR engine fails to recognize the text in canvas (converted to bitmap)

查看:388
本文介绍了Windows OCR引擎无法识别画布中的文本(转换为位图)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个cordova项目,我有一个scribble垫,用户可以涂抹他们的笔记。这是一个简单的画布对象,我想获得



无法识别



未识别:



未识别:



然后我发现了Windows.UI.Input.Inking命名空间,我认为这是唯一的方法。


I have a cordova project where I have a "scribble pad" where the user can scribble their notes. This is a simple canvas object, and I'd like to get the OCR Engine to convert it into text. I'm struggling to convert the canvas data into the software bitmap that OCR Engine supports.

All the samples are based either around loading a file from the storage or reading a stream from camera. Do I have to save this canvas into a file on a device and read it back in into a stream?

I'd welcome the guidance in here as images are something I struggle with.

[Update]

So, I've managed to somehow get the stream, but unfortunately, OCR is not recognizing it.

I have the canvas object and after page is loaded, I place the text into it, so any capable OCR should be able to read it. I also have the "img" element, for checking whether the stream is correct and contains the correct bitmap. Here is the code that handles the convas conversion to OCR recognition

     var blob = canvas.msToBlob();

// This is the stream I'll use for OCR detection
                var randomAccessStream = blob.msDetachStream();

// This is the stream I'll use for the image element to make sure the stream above contains what I've placed into the canvas    
                var blob2 = MSApp.createBlobFromRandomAccessStream("image/png", randomAccessStream.cloneStream());
                // Angular JS scope model
                $scope.imageUrl = URL.createObjectURL(blob2);


// This works, but returns ""
                var scope = this;
                if (!this.ocrEngine)
                    return;

                var bitmapDecoder = Windows.Graphics.Imaging.BitmapDecoder;
                bitmapDecoder.createAsync(randomAccessStream).then(function (decoder) {
                    return decoder.getSoftwareBitmapAsync();
                }).then(function (bitmap) {
                    return scope.ocrEngine.recognizeAsync(bitmap);
                }).then(function (result) {
                    console.log(result.text);
                }); 

After this all runs, the image is given the src and is loaded and contains exactly whatever is in the canvas so the stream is correct.

The ocrEngine is setup the following way:

var Globalization = Windows.Globalization;
            var OCR = Windows.Media.Ocr;
            this.ocrEngine = OCR.OcrEngine.tryCreateFromUserProfileLanguages();
            if (!this.ocrEngine) {
                // Try to create OcrEngine for specified language.
                // If language is not supported on device, method returns null.
                this.ocrEngine = OCR.OcrEngine.tryCreateFromLanguage(new Globalization.Language("en-us"));
            }
            if (!this.ocrEngine) {
                console.error("Selected language is not available.");
            }

Why is OCR not recognizing simple 'Hello World' ?

解决方案

well, that was rather embarrassing to realize that the reason why the OCR failed to read anything, even a system written text was that the resulting, generated image had a transparent background. Once I've included a rectangle with the white fill it all started to work correctly.

Unfortunately, the OCR is struggling to recognize anything I scribble on the canvas, so e.g. handwritten numbers or multiline text in canvas are not being recognized, see below

Recognized:

Not recognized

Not recognized:

Not recognized:

Then I've found Windows.UI.Input.Inking namespace and I reckon that's the only way to go.

这篇关于Windows OCR引擎无法识别画布中的文本(转换为位图)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆