itext获取内容大小 [英] itext get content size

查看:229
本文介绍了itext获取内容大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只花了几个小时在网上搜索。似乎其他人也有这个问题,但我找不到答案。

I just spent a few hours scouring the web. It seems others also have this issue, but I couldn't find an answer.

我有一大堆PDF文件,我需要他们来测量,即他们的页面内容的高度和宽度。

I have a whole bunch of PDF files that I need to get their measurements, namely their height and witdh of the pages content.

在Adobe Illustrator中,导入PDF时,您可以选择转向边界框。这正是我需要的。

In Adobe Illustrator, when you import a PDF you have the option of triming to the "bounding box". That's exactly what I need.

我尝试了很多方法,这里是大杂烩:

I tried many approaches, here's the hodgepodge:

Dim pdfStream = IO.File.OpenRead(FilePath)
Dim img = PdfImages(pdfStream)
Dim pdfReader = New PdfReader(pdfStream)
Dim pdfDictionary = pdfReader.GetPageN(1)
Dim mediaBox = pdfDictionary.GetAsArray(PdfName.MEDIABOX)
Dim b = pdfReader.GetPageSize(pdfDictionary)
Dim ms = New MemoryStream
Dim document = New Document(pdfReader.GetPageSizeWithRotation(1))
Dim writer = PdfWriter.GetInstance(document, ms)
document.Open()
document.SetPageSize(pdfReader.GetPageSize(1))
document.NewPage()
Dim cb = writer.DirectContent
cb.Clip()
Dim pageImport = writer.GetImportedPage(pdfReader, 1)
pdfReader.Close()
pdfStream.Close()

我设法获得的是页面大小,这是无用的。我在一大堆PDF上试过这个,所以它不像一个腐败的文件或其他东西。

All I manage to get is the page size, which is useless. I tried this on a whole bunch of PDFs, so it's not like one corrupt file or something.

推荐答案

为了实现你的目标,


转向边界框。这正是我所需要的

triming to the "bounding box". That's exactly what I need

你实际上必须解决两个问题:

you actually have to solve two problems:


  1. 您必须更改某些PDF文档的各个页面的裁剪框。

  2. 您必须确定边界框一些页面,即(我假设)包含页面所有可见内容的最小框(包含水平和垂直边)。

  1. You have to change the crop boxes of the individual pages of some PDF document.
  2. You have to determine the bounding box of some page, i.e. (as I assume) the smallest box (with horizontal and vertical sides) containing all visible content of a page.

广告1)更改各个页面的裁剪框

您不应使用为该任务找到的代码。操作单个文档几乎总是最好使用 PdfStamper,而不是 PdfWriter。

You should not use the code you found for that task. Manipulating a single document almost always is best done using a PdfStamper, not a PdfWriter.

iText in Action - 2nd Edition 示例 CropPages.java / CropPages.cs 显示了如何做到这一点。中心方法:

The iText in Action — 2nd Edition sample CropPages.java / CropPages.cs shows how to do that. The central method:

public byte[] ManipulatePdf(byte[] src)
{
  PdfReader reader = new PdfReader(src);
  int n = reader.NumberOfPages;
  PdfDictionary pageDict;
  PdfRectangle rect = new PdfRectangle(55, 76, 560, 816);
  for (int i = 1; i <= n; i++)
  {
    pageDict = reader.GetPageN(i);
    pageDict.Put(PdfName.CROPBOX, rect);
  }
  using (MemoryStream ms = new MemoryStream())
  {
    using (PdfStamper stamper = new PdfStamper(reader, ms))
    {
    }
    return ms.ToArray();
  }
}

(代码在内存中工作,即需要一个字节[]并返回一个,但可以很容易地修改为在文件系统中工作。)

(The code works in memory, i.e. expects a byte[] and returns one, but can easily be revised to work in the file system.)

如你所见,你实际上操纵了<$ c中的PDF $ c> PdfReader 然后只使用 PdfStamper 来存储更改的Pdf。

As you see, you actually manipulate the PDF as present in the PdfReader and then only use the PdfStamper to store the changed Pdf.

但是,在您的情况下,所有页面都没有固定的矩形,而是您必须确定每个页面的矩形...

In your case, though, there is no fixed rectangle for all pages but instead you have to determine the rectangle for each page...

Ad 2)确定某个页面的边界框

Ad 2) determine the bounding box of some page

要确定边界框,您实际上必须解析整个页面内容并确定每个绘制元素的尺寸。

To determine the bounding box you actually have to parse the whole page content and determine the dimensions of each drawn element.

不幸的是iText(夏普)只在一定程度上以一种舒适的方式支持它:它提供了一个内容解析框架,但是这个框架确实如此尚未处理开箱即用的矢量图形。

Unfortunately iText(Sharp) supports this in a comfortable manner only up to a certain degree: It provides a content parsing framework, but this framework does not yet handle vector graphics out of the box.

iText in Action - 2nd Edition 样本 ShowTextMargins.java / ShowTextMargins.cs 显示如何使用该框架来确定裁剪框(忽略矢量图形)。基本代码:

The iText in Action — 2nd Edition sample ShowTextMargins.java / ShowTextMargins.cs shows how you can use that framework to determine the cropbox (vector graphics ignored). The essential code:

PdfReaderContentParser parser = new PdfReaderContentParser(reader);
[...]
TextMarginFinder finder = parser.ProcessContent(i, new TextMarginFinder());

finder 来自 finder.GetLlx(),finder.GetLly(),finder.GetUrx(), finder.GetUry()之后 ProcessContent 执行提供页面 i 的边界框的左下角和右上角的坐标(忽略矢量图形)。您可以使用这些数据构建一个矩形,用于在上面的代码中提供 pageDict.Put(PdfName.CROPBOX,rect)

The finder via finder.GetLlx(), finder.GetLly(), finder.GetUrx(), and finder.GetUry() after that ProcessContent execution provides the coordinates of the lower left and upper right corners of the bounding box of page i (vector graphics ignored). You can use these data to construct a rectangle with which to feed pageDict.Put(PdfName.CROPBOX, rect) in the code above.

但是,如果您还需要考虑矢量图形,则必须稍微扩展解析器命名空间类,以便为矢量图形运算符创建解析事件,并且 TextMarginFinder 也考虑这些事件。有关此内容的更多信息,请参阅此答案

If you need to also take vector graphics into account, though, you'll have to extend the parser namespace classes somewhat to also create parsing events for vector graphics operators, and the TextMarginFinder to also take those events into account. For more on this read this answer.

这篇关于itext获取内容大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆