Google Cloud Vision - 数字和数字OCR [英] Google Cloud Vision - Numbers and Numerals OCR

查看:597
本文介绍了Google Cloud Vision - 数字和数字OCR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用Python读取具有特定格式XXX-XXX的数字的OCR程序。我使用了Google的Cloud Vision API文本识别,但结果不可靠。在30个高对比度1280 x 1024 bmp图像中,只有少数产生了正确的输出,或者至少在结果中包含了正确的输出。该程序倾向于省略一些数字,以非英语语言输出或潜入一些特殊字符中。



目标是至少连续输出正确的数字,doesn不管结果是否与其他垃圾相关。有没有一种方法可以帮助程序更好地识别数字,例如,将结果限制为特定格式还是数字?

p>此时,无法添加约束或为Vision API请求提供特定的期望数字格式,如

您也可以检查所有可能的请求参数(在 API参考中),没有显示任何指定数字格式。目前只有以下选项:


  • latLongRect:指定图片的位置

  • languageHints: text_detection (支持的语言列表这里



我假设您已经查看了多个回复(包含不同的图片区域)如果您可以使用不同位数的位置重建文本?注意,Vision API和 text_detection 没有专门针对您的数据进行优化,如果您有大量的带注释的数据,它也是使用Tensorflow建立自己的模型的一个选项。 此博客文章解释了系统设置以检测车牌号码(带有具体数字格式)。所有的代码都可以在 Github 上找到,问题似乎与您的问题非常相关。


I've been trying to implement an OCR program with Python that reads numbers with a specific format, XXX-XXX. I used Google's Cloud Vision API Text Recognition, but the results were unreliable. Out of 30 high-contrast 1280 x 1024 bmp images, only a handful resulted in the correct output, or at least included the correct output in the results. The program tends to omit some numbers, output in non-English languages or sneak in a few special characters.

The goal is to at least output the correct numbers consecutively, doesn't matter if the results are sprinkled with other junk. Is there a way to help the program recognize numbers better, for example limit the results to a specific format, or to numbers only?

解决方案

At this moment it is not possible to add constraints or to give a specific expected number format to Vision API requests, as mentioned here (by the Project Manager of Cloud Vision API).

You can also check all the possible request parameters (in the API reference), none indicating anything to specify number format. Currently only options to:

  • latLongRect: specify location of the image
  • languageHints: indicating the expected language for text_detection (list of supported languages here)

I assume you already checked out the multiple responses (with different included image regions) to see if you could reconstruct the text using the location of different digits?

Note that the Vision API and text_detection is not optimized for your data specifically, if you would have a lot of annotated data, it is also an option to actually build your own model using Tensorflow. This blogpost explains a system setup to detect number plates (with a specific number format). All the code is available on Github and the problem seems very related to yours.

这篇关于Google Cloud Vision - 数字和数字OCR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆