在 itextSharp 中使用 LocationTextExtractionStrategy 获取文本坐标 [英] Using LocationTextExtractionStrategy in itextSharp for text coordinate

查看:65
本文介绍了在 itextSharp 中使用 LocationTextExtractionStrategy 获取文本坐标的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是从 PDF 中检索数据,这些数据可能是表格结构的 Excel 文件.

My goal is to retrieve data from PDF which may be in table structure to an excel file.

将 LocationTextExtractionStrategy 与 iTextSharp 结合使用,我们可以从左到右的方式获取带有页面内容的纯文本字符串数据.

using LocationTextExtractionStrategy with iTextSharp we can get the string data in plain text with page content in left to right manner.

我怎样才能前进,以便在

How can I move forward such that during

PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy())

PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy())

我可以让文本在结果字符串中保持其坐标.

I could make the text retain its coordinate in the resulting string.

例如,如果 pdf 中的第一行文本右对齐,则结果字符串必须包含尾随空格或保持内容右对齐的空格.

As for instance if the first line in the pdf has text aligned to right, then the resulting string must be containing trailing space or spaces keeping the content right aligned.

请给出一些建议,我如何才能实现同样的目标.

Please give some suggestions, how I may proceed to achieve the same.

推荐答案

了解 PDF 不支持表格这一点非常重要.任何看起来像一张桌子的东西实际上只是一堆放置在线条背景上特定位置的文本.这非常重要,您在处理此问题时需要牢记这一点.

Its very important to understand that PDFs have no support for tables. Anything that looks like a table is really just a bunch of text placed at specific locations over a background of lines. This is very important and you need to keep this in mind as you work on this.

也就是说,您需要继承 TextExtractionStrategy 并将其传递到 GetTextFromPage().请参阅这篇文章 举个简单的例子.然后参见这篇文章,了解更复杂的子类化示例.后者与您的目标并不完全相关,但它确实展示了您可以做的一些更复杂的事情.

That said, you need to subclass TextExtractionStrategy and pass that into GetTextFromPage(). See this post for a simple example of that. Then see this post for a more complex example of subclassing. The latter isn't completely relevant to your goal but it does show some more complex things that you can do.

这篇关于在 itextSharp 中使用 LocationTextExtractionStrategy 获取文本坐标的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆