关于PDF中当前变换矩阵的困惑 [英] Confusion about current transformation matrix in a PDF

查看:451
本文介绍了关于PDF中当前变换矩阵的困惑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对PDF中的当前转换矩阵(CTM)感到有些困惑.对于此PDF 中的第5页,我已经检查了令牌流( http://pastebin.com/k6g4BGih ),它显示了curve (c)命令设置之前的最后一个cm操作转换矩阵到COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}.完整的输出位于 http://pastebin.com/9XaPQQm9 .

接下来,我使用以下代码集从 http://pastebin.com/htiULanR

  • Helper类:

    a.扩展PDFGraphicsStreamEngine的类: http://pastebin.com/zL2p75ha

    b. Path: http://pastebin.com/d3vXCgnC

    c. Subpath: http://pastebin.com/CxunHPiZ

    d. Segment: http://pastebin.com/XP1Dby6U

    e. Rectangle: http://pastebin.com/fNtHNtws

    f. Line: http://pastebin.com/042cgZBp

    g Curve: http://pastebin.com/wXbXZdqE

  • 在该代码中,我在PDFGraphicsStreamEngine类中覆盖的curveTo()方法内使用getGraphicsState().getCurrentTransformationMatrix()打印了CTM.这会将CTM显示为[0.1,0.0,0.0,0.1,0.0,0.0].所以我的问题是:

    1. 这两个CTM是否应该相同?

    2. 这两个CTM都具有缩放操作:第一个缩放系数为10,第二个缩放系数为0.1.如果我忽略缩放比例,则可以创建一个SVG 看起来非常接近原始PDF.但是我很困惑为什么会发生这种情况.我需要考虑使用all transformation matrices before the path而不是最后一个吗?

    解决方案

    首先:您说

    curve (c)命令之前的最后一个cm操作将转换矩阵设置为COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}.

    这是不正确的, cm 不会将转换矩阵设置为参数值,而是将矩阵参数和前一个电流相乘转换矩阵并将结果设置为新的当前转换矩阵,此过程也称为 concatenation .因此:

    1. 这两个CTM是否应该相同?

    否,因为未设置 cm ,因此将其连接!

    此外,当前的变换矩阵(以及所有其他图形状态值!)不仅由显式的setter或concatenator指令更改,而且还由您当前忽略的restore-state指令更改.因此:

    1. 我需要考虑路径前的所有变换矩阵而不是最后一个吗?

    您可能需要考虑的范围比最后一个要多,但是只有那些未被图形状态恢复撤消的对象.


    让我们看看您的示例文档...

    要跟踪当前的转换矩阵,必须同时检查 cm q / Q 指令.对于您的第5页,直到第一个 c 曲线指令为止的内容流都侧重于这些指令:

    q 0.1 0 0 0.1 0 0 cm
    q
    q 10 0 0 10 0 0 cm BT
    [...large text object...]
    ET Q
    Q
    q 
    [...clip path definition...]
    q 10 0 0 10 0 0 cm BT 
    [...small text object...]
    ET Q
    Q
    q 
    [...new clip path definition...]
    0.737761 w
    1 i
    2086.54 2327.82 m
    2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c 
    

    假设有一个初始身份转换矩阵,则表示当前当前转换矩阵和图形堆栈中当前转换矩阵的以下流程:

    CTM:1 0 0 1 0 0

    堆栈:空

    q
    

    CTM:1 0 0 1 0 0

    堆栈:1 0 0 1 0 0

    0.1 0 0 0.1 0 0 cm
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0

    q
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0

    q
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0/0.1 0 0 0.1 0 0

    10 0 0 10 0 0 cm
    

    CTM:1 0 0 1 0 0

    堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0/0.1 0 0 0.1 0 0

    BT
    [...large text object...]
    ET Q
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0

    Q
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0

    q 
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0

    [...clip path definition...]
    q
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0/0.1 0 0 0.1 0 0

    10 0 0 10 0 0 cm
    

    CTM:1 0 0 1 0 0

    堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0/0.1 0 0 0.1 0 0

    BT 
    [...small text object...]
    ET Q
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0

    Q
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0

    q 
    

    CTM:0.1 0 0 0.1 0 0

    堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0

    [...new clip path definition...]
    0.737761 w
    1 i
    2086.54 2327.82 m
    2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c 
    

    因此,当您观察时,PDFBox是正确的:

    我在PDFGraphicsStreamEngine类中覆盖的curveTo()方法内使用getGraphicsState().getCurrentTransformationMatrix()打印了CTM.这表示CTM为[0.1,0.0,0.0,0.1,0.0,0.0]

    I am having some confusions about the current transformation matrix (CTM) in PDFs. For page 5 in this PDF, I have examined the Token Stream (http://pastebin.com/k6g4BGih) and that shows the last cm operation before the curve (c) commands sets the transfomration matrix to COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}. The full output is at http://pastebin.com/9XaPQQm9 .

    Next I used the following set of codes to extract the line and curve commands from the same page following a code @mkl provided in a related SO question

    1. Main class: http://pastebin.com/htiULanR
    2. Helper classes:

      a. Class that extends PDFGraphicsStreamEngine: http://pastebin.com/zL2p75ha

      b. Path: http://pastebin.com/d3vXCgnC

      c. Subpath: http://pastebin.com/CxunHPiZ

      d. Segment: http://pastebin.com/XP1Dby6U

      e. Rectangle: http://pastebin.com/fNtHNtws

      f. Line: http://pastebin.com/042cgZBp

      g. Curve: http://pastebin.com/wXbXZdqE

    In that code, I printed the CTM using getGraphicsState().getCurrentTransformationMatrix() inside the curveTo() method that is overridden from PDFGraphicsStreamEngine class. That shows the CTM as [0.1,0.0,0.0,0.1,0.0,0.0]. So my questions are:

    1. Shouldn't these two CTMs be the same?

    2. Both these CTMs have scaling operations: the first one scales with a factor of 10 and the second one scales with a factor of 0.1. If I ignore the scaling, I can create an SVG which looks fairly close to the original PDF. But I am confused why that should happen. Do I need to consider all transformation matrices before the path instead of the last one?

    解决方案

    First of all: You say

    the last cm operation before the curve (c) commands sets the transfomration matrix to COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}.

    This is not correct, cm does not set the transformation matrix to the parameter values but it multiplies the matrix parameter and the former current transformation matrix and sets the result as the new current transformation matrix, a process also called concatenation. Thus:

    1. Shouldn't these two CTMs be the same?

    No, because cm doesn't set, it concatenates!

    Furthermore, the current transformation matrix (and all other graphics state values!) is not only changed by the explicit setter or concatenator instructions but also the restore-state instruction which you ignore currently. Thus:

    1. Do I need to consider all transformation matrices before the path instead of the last one?

    You may have to consider more than the last, but only those not undone by graphics state restoration.


    Let's look at your example document...

    When you want to keep track of the current transformation matrix, you have to inspect both the cm and the q/Q instructions. In case of your page 5 the content stream with focus on those instructions up to the first c curve instruction looks like this:

    q 0.1 0 0 0.1 0 0 cm
    q
    q 10 0 0 10 0 0 cm BT
    [...large text object...]
    ET Q
    Q
    q 
    [...clip path definition...]
    q 10 0 0 10 0 0 cm BT 
    [...small text object...]
    ET Q
    Q
    q 
    [...new clip path definition...]
    0.737761 w
    1 i
    2086.54 2327.82 m
    2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c 
    

    Assuming a starting identity transformation matrix this implies the following flow of currently current transformation matrix and the current transformation matrices in the graphics stack:

    CTM: 1 0 0 1 0 0

    Stack: empty

    q
    

    CTM: 1 0 0 1 0 0

    Stack: 1 0 0 1 0 0

    0.1 0 0 0.1 0 0 cm
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0

    q
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

    q
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

    10 0 0 10 0 0 cm
    

    CTM: 1 0 0 1 0 0

    Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

    BT
    [...large text object...]
    ET Q
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

    Q
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0

    q 
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

    [...clip path definition...]
    q
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

    10 0 0 10 0 0 cm
    

    CTM: 1 0 0 1 0 0

    Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

    BT 
    [...small text object...]
    ET Q
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

    Q
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0

    q 
    

    CTM: 0.1 0 0 0.1 0 0

    Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

    [...new clip path definition...]
    0.737761 w
    1 i
    2086.54 2327.82 m
    2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c 
    

    Thus, PDFBox is correct when you observe:

    I printed the CTM using getGraphicsState().getCurrentTransformationMatrix() inside the curveTo() method that is overridden from PDFGraphicsStreamEngine class. That shows the CTM as [0.1,0.0,0.0,0.1,0.0,0.0]

    这篇关于关于PDF中当前变换矩阵的困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆