如何在tesseract中保留文档结构 [英] How to preserve document structure in tesseract

查看：123 发布时间：2020/5/19 19:24:11 ocr tesseract

本文介绍了如何在tesseract中保留文档结构的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用tesseract ocr从图像中提取文本.保持文档的结构对我来说非常重要.当前，tesseract不会保留结构，实际上会更改文本的顺序.我的输入是下图.

I am using tesseract ocr to extract text from an image. Preserving the structure of the document is very important to me. Currently tesseract does not preserve the structure, infact it changes the order of text. My input is the image below.

我得到的输出如下:

Someto the left
Someto the left

Some in the middle
Some in the middle

Some with some tab
Some with some tab

Some with some space between them
Some with some space between them

Sometext here
Sometext here

this much
this much

如何从图像的相同结构中获得所需的输出?

即如下:

                                                 Some text here
                                                 Some text here

Some to the left
Some to the left

                    Some in the middle
                    Some in the middle

        Some with some tab
        Some with some tab

Some with some space between them                       this much
Some with some space between them                       this much

如何在tesseract中保留文档结构 [英] How to preserve document structure in tesseract

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在tesseract中保留文档结构 [英] How to preserve document structure in tesseract

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭