tesseract可以接受非字体符号的培训吗? [英] Can tesseract be trained for non-font symbols?

查看:75
本文介绍了tesseract可以接受非字体符号的培训吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇如何才能更可靠地识别出扑克牌图像的价值和适合程度.这是两个示例:

I'm curious about how I may be able to more reliably recognise the value and the suit of playing card images. Here are two examples:

图像中可能会有一些噪点,但是我有大量的图像数据集可用于训练(大约10k png,包括所有值和西服).

There may be some noise in the images, but I have a large dataset of images that I could use for training (roughly 10k pngs, including all values & suits).

如果我使用散列方法知道完全匹配,则我可以可靠地识别手动分类的图像.但是由于我是根据图像的内容对图像进行哈希处理,因此,最小的噪声会更改哈希值,并导致图像被视为未知图像.这就是我希望通过进一步的自动化可靠地解决的问题.

I can reliably recognise images that I've manually classified, if I have a known exact-match using a hashing method. But since I'm hashing images based on their content, then the slightest noise changes the hash and results in an image being treated as unknown. This is what I'm looking to reliably address with further automation.

我一直在审查有关培训tesseract的3.05文档: https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#automated-method

I've been reviewing the 3.05 documentation on training tesseract: https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#automated-method

只能使用字体中找到的图像来训练tesseract吗?还是我可以用它来识别这些卡的花色?

Can tesseract only be trained with images found in fonts? Or could I use it to recognise the suits for these cards?

我希望我可以说这个文件夹中的所有图像都对应于4c(例如上面的示例图像),并且tesseract会在该图像的任何将来实例中看到相似性(不管噪声如何),并且还阅读了作为4c.这可能吗?这里有人有经验吗?

I was hoping that I could say that all images in this folder correspond to 4c (e.g. the example images above), and that tesseract would see the similarity in any future instances of that image (regardless of noise) and also read that as 4c. Is this possible? Does anyone here have experience with this?

推荐答案

这是我的非tesseract解决方案,直到有人证明有更好的方法为止.我已经设置:

This has been my non-tesseract solution to this, until someone proves there's a better way. I've setup:

让它们开始运行是最困难的部分.接下来,我使用数据集来训练新的Caffe网络.我将数据集准备成一个深度文件夹结构:

Getting these to running was the hardest part. Next, I used my dataset to train a new caffe network. I prepared my dataset into a single depth folder structure:

./card
./card/2c
./card/2d
./card/2h
./card/2s
./card/3c
./card/3d
./card/3h
./card/3s
./card/4c
./card/4d
./card/4h
./card/4s
./card/5c
./card/5d
./card/5h
./card/5s
./card/6c
./card/6d
./card/6h
./card/6s
./card/7c
./card/7d
./card/7h
./card/7s
./card/8c
./card/8d
./card/8h
./card/8s
./card/9c
./card/9d
./card/9h
./card/9s
./card/_noise
./card/_table
./card/Ac
./card/Ad
./card/Ah
./card/As
./card/Jc
./card/Jd
./card/Jh
./card/Js
./card/Kc
./card/Kd
./card/Kh
./card/Ks
./card/Qc
./card/Qd
./card/Qh
./card/Qs
./card/Tc
./card/Td
./card/Th
./card/Ts

在数字范围内,我选择了:

Within Digits, I chose:

  1. 数据集"标签
  2. 新数据集图像
  3. 分类
  4. 我将其指向我的卡文件夹,例如:/path/to/card
  5. 根据此处的讨论,我将验证百分比设置为13.0%: https://stackoverflow.com/a/13612921/880837
  6. 创建数据集后,我打开了模型"选项卡
  7. 选择我的新数据集.
  8. 选择标准网络"下的GoogLeNet,然后将其留给培训.

每次在数据集中有新图像时,我都会这样做几次.每次学习课程需要6到10个小时,但是在这一阶段,我可以使用以下逻辑使用caffemodel以编程方式估算每个图像的期望值: https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp

I did this several times, each time I had new images in the dataset. Each learning session took 6-10 hours, but at this stage I can use my caffemodel to programmatically estimate what each image is expected to be, using this logic: https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp

结果是一张卡片(2c,7h等),噪音或桌子.精度大于90%的任何估计都最有可能是正确的.最新运行正确识别了400张图像中的300张,只有3个错误.我正在向数据集中添加新图像并重新训练现有模型,从而进一步调整结果的准确性.希望这对其他人有价值!

The results are either a card (2c, 7h, etc), noise, or table. Any estimates with an accuracy bigger than 90% are most likely correct. The latest run correctly recognised 300 out of 400 images, with only 3 mistakes. I'm adding new images to the dataset and retraining the existing model, further tuning the result accuracy. Hope this is valuable to others!

尽管我想在这里进行高级操作,但由于大卫·汉弗莱(David Humphrey)和他在github上的帖子,所有这些工作都得到了很大的回报,我真的建议您阅读并尝试一下,如果您有兴趣了解更多信息,请尝试:

While I wanted the high level steps here, this was all done with large thanks to David Humphrey and his github post, I really recommend reading it and trying it out if you're interested in learning more: https://github.com/humphd/have-fun-with-machine-learning

这篇关于tesseract可以接受非字体符号的培训吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆