Tesseract OCR力模式 [英] Tesseract OCR force pattern

查看：112 发布时间：2020/5/19 19:30:54 regex ocr tesseract

本文介绍了Tesseract OCR力模式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想像下面的文章一样用Tesseract读取特定的字符序列: Tesseract OCR:是否可以强制使用特定模式?

I want to read a specific character sequence with Tesseract like this post : Tesseract OCR: is it possible to force a specific pattern?

我尝试了 bazaar 匹配模式模式为\d\d\d\A\A和ocr的Tesseract仍然可以识别不匹配的其他单词.

I have tried bazaar matching pattern in Tesseract with the pattern \d\d\d\A\A and ocr still recognize other words which doesn't match.

我尝试使用"tessedit_char_whitelist"参数，但无法使用该参数选择字符的位置.

I have tried to use the "tessedit_char_whitelist" parameter but I can't choose the position of the characters with that.

我启动命令:tesseract image.jpg result -l eng bazaar 我收到此消息:

I launch the command : tesseract image.jpg result -l eng bazaar And I have this message :

请在模式开头至少提供4个具体字符

Please provide at least 4 concrete characters at the beginning of the pattern

无效的用户模式\A\A\d\d\d

带有Leptonica的Tesseract开源OCR引擎v3.01

Tesseract Open Source OCR Engine v3.01 with Leptonica

image.jpg:

结果:

The result :

AB123
ABC12
A1234
12345
ABCD1

所以错了，我只想捕捉序列"AB123".

So it is wrong, I just wanted to catch the sequence "AB123".

有人可以告诉我为什么我的用户模式文件中的正则表达式无效吗?对于配置，我严格遵循了集市教程.

Can somebody tell me why the regular expression in my user-patterns file as no effect ? For the configuration, I have strictly followed the bazaar tutorial.

推荐答案

请尝试将此模式与量词一起使用.

Try using this pattern with quantifiers instead.

[a-zA-Z]{2}\d{3}

这应该只覆盖2个字母字符和3个数字.

This should cover only 2 alphabetical characters and 3 digits.

您之前匹配所有内容的原因是\ w是字母数字.

The reason why you are matching everything before is because \w is alphanumeric.

这篇关于Tesseract OCR力模式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Tesseract OCR力模式 [英] Tesseract OCR force pattern

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Tesseract OCR力模式 [英] Tesseract OCR force pattern

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭