从文本的图像生成字体 [英] Generate font from an image of text

查看:201
本文介绍了从文本的图像生成字体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


是否可以从下面给出的图像


生成特定的
字体想法是生成一个特定的字体
为文本的下面给定的图像,通过
手动选择
图像的一部分,并映射到一组
的字母。生成字体为这
,然后使用这种字体,使
可读的OCR.Is生成
字体可能使用任何开源
实现?还请建议
任何好的OCR。




你的问题是行间距太小。每行的下行字符与下面行中字符的字符边框重叠。这使得字符分割几乎是不可能的,因为字符是触摸和重叠的。重叠字符的组合数量实际上是不可能训练的。 g和y字是最差的罪犯。

这个双行间隔版本可能相当不错。

一个自定义的解决方案将每一行与一个好的字典一起分割和分隔,肯定会改善结果。仍然会有一些错误来手动更正。自定义程序将不得不处理上行和下行,并尝试将图像分割成线,然后可以将其馈送到正确的OCR引擎。一种方法是分析页面上的每个字符blob并将其分配到一行。 Leptonica(www.leptonica.com - C Imaging Library)可能会让这项工作变得容易一些。



我不会在不将分辨率提高到200或300 dpi第一。



使用这个自定义解决方案,如果OCR引擎最初做得不好,那么训练一个字体就成了一个选项。 > Abbyy(www.abbyy.com)或Google Tesseract OCR 3.00将是一个很好的开始。



没有保证是否所有这些都能正常工作。这对于OCR来说是一个相当困难的工作,你需要弄清楚在海外打字是否更好。这取决于需要处理的页面数量。

Is it possible to generate a specific set of font from the below given image ?

My idea is to generate a specific font for the below given image of text ,by manually selecting portion of the image and mapping it to a set of letter's.Generate the font for this and then use this font to make it readable for an OCR.Is generation of font possible using any open-source implementation ? Also please suggest any good OCR's.

解决方案

Abbyy FineReader 10 gets better than expected results but predictably gets confused when the characters touch.

Your problem is that the line spacing is too small. The descenders of each line overlap the character bounding boxes of the characters in the line directly below. This makes character segmentation almost impossible because the characters are touching and overlapping. The number of combinations of overlapping characters is virtually impossible to train for. The 'g' and 'y' characters are the worst offenders.

A double line spaced version of this would probably OCR reasonably well.

A custom solution that segmented and separated the each line along with a good dictionary would definitely improve the results. There would still be some errors to correct manually though. The custom routine would have to deal with the ascenders and descenders and try and segment the image into lines which can then be fed to a decent OCR engine. One way would be to analyse every character blob on the page and allocate it to a line. Leptonica (www.leptonica.com - C Imaging Library) would probably make this job a little easier.

I would not try this without increasing the resolution to 200 or 300 dpi first.

With this custom solution, training a font becomes an option if the OCR engine does a poor job initially.

Abbyy (www.abbyy.com) or Google Tesseract OCR 3.00 would be a good place to start.

No guarantees as to whether all of this will work though. This is quite a difficult page to OCR and you need to work out whether it is better to have it typed up manually overseas. It depends on the number of pages to need to process.

这篇关于从文本的图像生成字体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆