iTextSharp - 阅读2列PDF [英] iTextSharp - Reading PDF with 2 columns

查看:100
本文介绍了iTextSharp - 阅读2列PDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在阅读带有页眉和页脚的PDF时遇到了麻烦,但身体上有2列。

I'm having trouble reading a PDF with header and footer but with 2 columns in your body.

我已经有了标题的列宽和高度,但我需要代码来读取带有列的页面。

I already have the column widths and height of the header but I need the code to read the pages with columns.

任何人都可以向我提供一段用PDF读取PDF的代码吗?

Can anyone provide me a piece of code that reads PDF with columns?

谢谢

推荐答案

很难实现你想要的如果你不知道列的位置,但我认为你有它的坐标,因为你说我已经有列的宽度和高度。在这种情况下,您的问题与StackOverflow上发布的其他问题没有什么不同: iTextSharp从特定位置读取

It's very hard to achieve what you want if you don't know the position of the columns, but I assume that you have its coordinates because you say "I already have the column widths and height". In that case, your question isn't that different from this other question posted on StackOverflow: iTextSharp read from specific position

假设 rect 是一个 Rectangle 对应列的位置,那么你需要这段代码:

Suppose that rect is a Rectangle corresponding with the position of a column, then you need this code:

RenderFilter[] filter = {new RegionTextRenderFilter(rect)};
ITextExtractionStrategy strategy = new FilteredTextRenderListener(
    new LocationTextExtractionStrategy(), filter);
String single_column = PdfTextExtractor.GetTextFromPage(reader, i, strategy));

现在您将文本放在一列中。您需要为页面上的每一列重复此操作。

Now you have the text in a single column. You need to repeat this for every column on your page.

额外评论虽然在大多数情况下使用 RegionTextRenderFilter 可以正常工作,少数情况(通过简单地在行中插入额外的空格字符来创建列)可能需要将文本块拆分为事先处理。这可以通过例如完成。使用 TextRenderInfoSplitter -sharp / 21023311#21023311>这个答案并在其中包装 FilteredTextRenderListener 。 (此评论由 mkl 提供。)

Extra comment: While in most cases using the RegionTextRenderFilter will work just fine, a few cases (in which columns are created by simply inserting additional space characters in the lines) might require to split the text chunks to process in advance. This can be done e.g. by using the TextRenderInfoSplitter from this answer and wrapping the FilteredTextRenderListener in it. (This comment was provided by mkl.)

这篇关于iTextSharp - 阅读2列PDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆