iTextSharp中的PDF坐标系 [英] PDF coordinate system in iTextSharp

查看:126
本文介绍了iTextSharp中的PDF坐标系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在我正在使用iTextSharp从PDF中提取线条和矩形。我使用的方法如下:

Now I'm working on extracting line and rectangle from PDF by using iTextSharp. The method I used is as below:

PdfReader reader = new PdfReader(strPDFFileName);
var pageSize = reader.GetPageSize(1);
var cropBox = reader.GetCropBox(1);
byte[] pageBytes = reader.GetPageContent(1);
PRTokeniser tokeniser = new PRTokeniser(new(RandomAccessFileOrArray(pageBytes));
PRTokeniser.TokType tokenType;
string tokenValue;
CoordinateCollection cc = new CoordinateCollection();
while (tokeniser.NextToken())
{
   tokenType = tokeniser.TokenType;
   tokenValue = tokeniser.StringValue;

   if (tokenType == PRTokeniser.TokType.OTHER)
   {
      if (tokenValue == "re")
      {
         if (buf.Count < 5)
         {
            continue;
         }

         float x = float.Parse(buf[buf.Count - 5]);
         float y = float.Parse(buf[buf.Count - 4]);
         float w = float.Parse(buf[buf.Count - 3]);
         float h = float.Parse(buf[buf.Count - 2]);
         Coordinate co = new Coordinate();
         co.type = "re";
         co.X1 = x;
         co.Y1 = y;
         co.W = w;
         co.H = h;
         cc.AddCoordinate(co);

      }


    }
 }

代码工作良好。但是我遇到了关于PDF测量单元的问题。来自reader.getPageSize的值是(619 * 792),这意味着页面大小是691 * 792,但是当我从tokeniser获得矩形时,x和y总是超过页面大小,它的值总是x = 150,y = 4200,w = 1500,h = 2000。

The code works fine. But I encounter an issue about PDF measurement unit. The value get from reader.getPageSize is (619*792), it means the page size is 691*792, but when I get rectangle from tokeniser, the x and y are always over the page size, always the value of it is x=150,y=4200,w=1500,h=2000.

我认为reader.getPageSize和tokeniser的测量单位是不同的。

I believe the measurement unit of reader.getPageSize and tokeniser is different.

那么请你帮忙告诉我如何转换它们?

So could you please help to tell me How can I convert them?

推荐答案

作为开始说明:实际提取的是 的坐标参数重新 在PDF内容流中操作,其值特定于iTextSharp。

As a starting remark: What you extract actually are the coordinate parameters of the re operation in the PDF content stream, their values are not iTextSharp specific.

要理解为什么矩形的坐标看起来如此偏离页面,你首先必须意识到PDF中使用的坐标系是可变的!

To understand why the coordinates of the rectangle seem so much off-page, you first have to realize that the coordinate system used in PDFs is mutable!

用户空间坐标系仅 初始化 CropBox 条目的默认状态在页面字典中指定对应于可见区域的用户空间矩形。

The user space coordinate system merely is initialized to a default state in which the CropBox entry in the page dictionary specifies the rectangle of user space corresponding to the visible area.

在页面内容操作过程中,坐标系可能是 转换 ,甚至多次,使用 cm 操作。常见的转换是旋转,平移,倾斜,缩放

In the course of the page content operations, the coordinate system may be transformed, even multiple times, using the cm operation. Common transformations are rotations, translations, skews, and scalings.

在您的情况下,最有可能至少是缩放已经到位。

In your case most likely at least a scaling is in place.

您可能想要研究 PDF规范

要检索包含转换的坐标,除重新操作外,您还可以找到 cm 操作。此外,您必须找到 q Q 操作(保存和恢复图形状态,包括当前转换矩阵)。

To retrieve coordinates including transformations, you have find cm operations in addition to the re operations. Furthermore, you have to find q and Q operations (save and restore graphics state, including the current transformation matrix).

幸运的是,iTextSharp的解析器命名空间类可以为您完成大部分繁重工作,因为版本5.5.6它们也支持矢量图形。您只需实现 IExtRenderListener 并使用实例解析内容。

Fortunately iTextSharp's parser namespace classes can do most of the heavy lifting for you, since version 5.5.6 they also support vector graphics. You merely have to implement IExtRenderListener and parse content using an instance.

例如。要在控制台上输出矢量图形信息,您可以使用如下实现:

E.g. to output vector graphics information on the console, you can use an implementation like this:

class VectorGraphicsListener : IExtRenderListener
{
    public void ModifyPath(PathConstructionRenderInfo renderInfo)
    {
        if (renderInfo.Operation == PathConstructionRenderInfo.RECT)
        {
            float x = renderInfo.SegmentData[0];
            float y = renderInfo.SegmentData[1];
            float w = renderInfo.SegmentData[2];
            float h = renderInfo.SegmentData[3];
            Vector a = new Vector(x, y, 1).Cross(renderInfo.Ctm);
            Vector b = new Vector(x + w, y, 1).Cross(renderInfo.Ctm);
            Vector c = new Vector(x + w, y + h, 1).Cross(renderInfo.Ctm);
            Vector d = new Vector(x, y + h, 1).Cross(renderInfo.Ctm);

            Console.Out.WriteLine("Rectangle at ({0}, {1}) with size ({2}, {3})", x, y, w, h);
            Console.Out.WriteLine("--> at ({0}, {1}) ({2}, {3}) ({4}, {5}) ({6}, {7})", a[Vector.I1], a[Vector.I2], b[Vector.I1], b[Vector.I2], c[Vector.I1], c[Vector.I2], d[Vector.I1], d[Vector.I2]);
        }
        else
        {
            switch (renderInfo.Operation)
            {
                case PathConstructionRenderInfo.MOVETO:
                    Console.Out.Write("Move to");
                    break;
                case PathConstructionRenderInfo.LINETO:
                    Console.Out.Write("Line to");
                    break;
                case PathConstructionRenderInfo.CLOSE:
                    Console.Out.WriteLine("Close");
                    return;
                default:
                    Console.Out.Write("Curve along");
                    break;
            }
            List<Vector> points = new List<Vector>();
            for (int i = 0; i < renderInfo.SegmentData.Count - 1; i += 2)
            {
                float x = renderInfo.SegmentData[i];
                float y = renderInfo.SegmentData[i + 1];
                Console.Out.Write(" ({0}, {1})", x, y);
                Vector a = new Vector(x, y, 1).Cross(renderInfo.Ctm);
                points.Add(a);
            }
            Console.Out.WriteLine();
            Console.Out.Write("--> at ");
            foreach (Vector point in points)
            {
                Console.Out.Write(" ({0}, {1})", point[Vector.I1], point[Vector.I2]);
            }
            Console.Out.WriteLine();
        }
    }

    public void ClipPath(int rule)
    {
        Console.Out.WriteLine("Clip");
    }

    public iTextSharp.text.pdf.parser.Path RenderPath(PathPaintingRenderInfo renderInfo)
    {
        switch (renderInfo.Operation)
        {
            case PathPaintingRenderInfo.FILL:
                Console.Out.WriteLine("Fill");
                break;
            case PathPaintingRenderInfo.STROKE:
                Console.Out.WriteLine("Stroke");
                break;
            case PathPaintingRenderInfo.STROKE + PathPaintingRenderInfo.FILL:
                Console.Out.WriteLine("Stroke and fill");
                break;
            case PathPaintingRenderInfo.NO_OP:
                Console.Out.WriteLine("Drop");
                break;
        }
        return null;
    }

    public void BeginTextBlock() { }
    public void EndTextBlock() { }
    public void RenderImage(ImageRenderInfo renderInfo) { }
    public void RenderText(TextRenderInfo renderInfo) { }
}

并将其应用于这样的PDF:

and apply it to a PDF like this:

using (var pdfReader = new PdfReader(....))
{
    // Loop through each page of the document
    for (var page = 1; page <= pdfReader.NumberOfPages; page++)
    {
        VectorGraphicsListener listener = new VectorGraphicsListener();

        PdfReaderContentParser parser = new PdfReaderContentParser(pdfReader);
        parser.ProcessContent(page, listener);
    }
}

矩形之后, 移动到线到沿着曲线,您将看到坐标信息而不应用转换,即像您一样检索。

After Rectangle at, Move to, Line to, and Curve along you'll see the coordinate information without applying the transformation, i.e. retrieved like you did.

- > 之后,你会看到相应的变换坐标。

After --> you'll see the corresponding transformed coordinates.

PS 此功能仍然是新功能。可能很快就会使用另一种更简单的方法来支持iTextSharp为您捆绑路径信息,而不是简单地一次转发一个路径构建操作。

PS This feature is still new. Probably it will shortly be supported by using an alternative, easier approach in which iTextSharp bundles path information for you instead of simply forwarding each path building operation one at a time.

这篇关于iTextSharp中的PDF坐标系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆