用于 OCR 的场景文本图像超分辨率 [英] Scene Text Image Super-Resolution for OCR

查看:309
本文介绍了用于 OCR 的场景文本图像超分辨率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究 OCR 系统.我在识别 ROI 内的文本时面临的一个挑战是 抖动运动效果 镜头或文本由于角度位置而无法聚焦.请考虑以下演示示例

如果您注意到文本(例如标记为红色),在这种情况下,OCR 系统无法正确识别文本.但是,这种情况也可能在没有角度拍摄的情况下出现,其中图像太模糊以至于 OCR 系统无法识别或部分识别文本.有时它们模糊或有时非常低分辨率像素化.例如

我们尝试过的方法

首先,我们尝试了 SO 上可用的各种方法.但遗憾的是没有运气.

  • 2.神经增强

    3.ISR

    更新 2

    1. [方法]:通过核估计和噪声注入实现的真实世界超分辨率试过这个方法.有希望.但是,在我们的案例中不起作用.代码.

    2. [方法]:照片修复与上述所有方法相比,它在 OCR 的超文本分辨率方面的表现令人惊讶.它极大地去除了噪声、模糊等,使图像更加清晰,更好地增强了模型的泛化能力.代码.

    我的查询

    是否有任何有效的解决方法来处理此类情况?任何可以改善这种模糊低分辨率像素的方法,无论文本是前面还是远处拍摄角度?

    解决方案

    目前,有一种解决方案通过核估计和噪声注入实现的真实世界超分辨率.作者提出了一个退化框架RealSR,为超分辨率学习提供了逼真的图像.这是一种很有前途的抖动或运动效果图像超分辨率方法.

    该方法分为两个阶段.第一阶段超分辨率的真实降级

    <块引用>

    是从真实数据中估计退化并真实地生成LR 图像.

    第二阶段超分辨率模型

    <块引用>

    是基于构建的数据训练SR模型.

    你可以看看这篇 Github 文章:https://github.com/jixiaozhong/RealSR

    I am working on an OCR system. A challenge that I'm facing for recognizing the text within ROI is due to the shakiness or motion effect shot or text that is not focus due to angle positions. Please consider the following demo sample

    If you notice the texts (for ex. the mark as a red), in such cases the OCR system couldn't properly recognize the text. However, this scenario can also come on with no angle shot where the image is too blurry that the OCR system can't recognize or partially recognize the text. Sometimes they are blurry or sometimes very low resolution or pixelated. For example

    Methods we've tried

    Firstly we've tried various methods available on SO. But sadly no luck.

    Next, we've tried the following three most promising methods as below.

    1.TSRN

    A recent research work (TSRN) mainly focuses on such cases. The main intuitive of it is to introduce super-resolution (SR) techniques as pre-processing. This implementation looks by far the most promising. However, it fails to do magic on our custom dataset (for example the second images above, the blue text). Here are some example from their demonstration:

    2. Neural Enhance

    After looking at its illustration on its page, we believed it might work. But sadly it also couldn't address the problem. However, I was a bit confusing even with their showed example because I couldn't reproduce them too. I've raised an issue on github where I demonstrated this more in detail. Here are some example from their demonstration:

    3. ISR

    The last choice with minimum hope with this implementation. No luck either.

    Update 1

    • [Method]: Apart from the above, we also tried some traditional approaches such as Out-of-focus Deblur Filter (Wiener filter and also unsupervised Weiner filter). We also checked the Richardson-Lucy method. but no improvement with this approach either.

    • [Method]: We’ve checked out a GAN based DeBlur solution. DeblurGAN I have tried this network. What attracted me was the approach of the Blind Motion Deblurring mechanism.

    Lastly, from this discussion we encounter this research work which seems really good enough. Didn't try this yet.

    Update 2

    1. [Method]: Real-World Super-Resolution via Kernel Estimation and Noise Injection Tried this method. Promising. However, didn't work in our case. Code.

    2. [Method]: Photo Restoration Comparative to the above all methods, it performs the best surprisingly in super text resolution for OCR. It greatly removes noise, blurriness, etc., and makes the image much clearer and which enhance model generalization better. Code.

    My Query

    Is there any effective workaround to tackle such cases? Any methods that could improve such blurry or low-resolution pixels whether the texts are in front or far away due to the camera angle?

    解决方案

    Currently, there is one solution Real-World Super-Resolution via Kernel Estimation and Noise Injection. The author proposes a degradation framework RealSR, which provides realistic images for super-resolution learning. It is a promising method for shakiness or motion effect images super-resolution.

    The method is divided into two stages. The first stage Realistic Degradation for Super-Resolution

    is to estimate the degradation from real data and generate realistically LR images.

    The second stage Super-Resolution Model

    is to train the SR model based on the constructed data.

    You can look at this Github article: https://github.com/jixiaozhong/RealSR

    这篇关于用于 OCR 的场景文本图像超分辨率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆