关于PDF中当前变换矩阵的困惑 [英] Confusion about current transformation matrix in a PDF
问题描述
我对PDF中的当前转换矩阵(CTM)感到有些困惑.对于此PDF 中的第5页,我已经检查了令牌流( http://pastebin.com/k6g4BGih ),它显示了curve (c)
命令设置之前的最后一个cm
操作转换矩阵到COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}
.完整的输出位于 http://pastebin.com/9XaPQQm9 .
接下来,我使用以下代码集从 http://pastebin.com/htiULanR
Helper类:
a.扩展PDFGraphicsStreamEngine
的类: http://pastebin.com/zL2p75ha
b. Path
: http://pastebin.com/d3vXCgnC
c. Subpath
: http://pastebin.com/CxunHPiZ
Segment
: http://pastebin.com/XP1Dby6U
e. Rectangle
: http://pastebin.com/fNtHNtws
f. Line
: http://pastebin.com/042cgZBp
g Curve
: http://pastebin.com/wXbXZdqE
在该代码中,我在PDFGraphicsStreamEngine
类中覆盖的curveTo()
方法内使用getGraphicsState().getCurrentTransformationMatrix()
打印了CTM.这会将CTM显示为[0.1,0.0,0.0,0.1,0.0,0.0]
.所以我的问题是:
-
这两个CTM是否应该相同?
-
这两个CTM都具有缩放操作:第一个缩放系数为10,第二个缩放系数为0.1.如果我忽略缩放比例,则可以创建一个SVG 看起来非常接近原始PDF.但是我很困惑为什么会发生这种情况.我需要考虑使用
all transformation matrices before the path
而不是最后一个吗?
首先:您说
curve (c)
命令之前的最后一个cm
操作将转换矩阵设置为COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}
.
这是不正确的, cm 不会将转换矩阵设置为参数值,而是将矩阵参数和前一个电流相乘转换矩阵并将结果设置为新的当前转换矩阵,此过程也称为 concatenation .因此:
- 这两个CTM是否应该相同?
否,因为未设置 cm ,因此将其连接!
此外,当前的变换矩阵(以及所有其他图形状态值!)不仅由显式的setter或concatenator指令更改,而且还由您当前忽略的restore-state指令更改.因此:
- 我需要考虑路径前的所有变换矩阵而不是最后一个吗?
您可能需要考虑的范围比最后一个要多,但是只有那些未被图形状态恢复撤消的对象.
让我们看看您的示例文档...
要跟踪当前的转换矩阵,必须同时检查 cm 和 q / Q 指令.对于您的第5页,直到第一个 c 曲线指令为止的内容流都侧重于这些指令:
q 0.1 0 0 0.1 0 0 cm
q
q 10 0 0 10 0 0 cm BT
[...large text object...]
ET Q
Q
q
[...clip path definition...]
q 10 0 0 10 0 0 cm BT
[...small text object...]
ET Q
Q
q
[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c
假设有一个初始身份转换矩阵,则表示当前当前转换矩阵和图形堆栈中当前转换矩阵的以下流程:
CTM:1 0 0 1 0 0
堆栈:空
q
CTM:1 0 0 1 0 0
堆栈:1 0 0 1 0 0
0.1 0 0 0.1 0 0 cm
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0
q
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0
q
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0/0.1 0 0 0.1 0 0
10 0 0 10 0 0 cm
CTM:1 0 0 1 0 0
堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0/0.1 0 0 0.1 0 0
BT
[...large text object...]
ET Q
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0
Q
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0
q
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0
[...clip path definition...]
q
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0/0.1 0 0 0.1 0 0
10 0 0 10 0 0 cm
CTM:1 0 0 1 0 0
堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0/0.1 0 0 0.1 0 0
BT
[...small text object...]
ET Q
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0
Q
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0
q
CTM:0.1 0 0 0.1 0 0
堆栈:1 0 0 1 0 0/0.1 0 0 0.1 0 0
[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c
因此,当您观察时,PDFBox是正确的:
我在
PDFGraphicsStreamEngine
类中覆盖的curveTo()
方法内使用getGraphicsState().getCurrentTransformationMatrix()
打印了CTM.这表示CTM为[0.1,0.0,0.0,0.1,0.0,0.0]
I am having some confusions about the current transformation matrix (CTM) in PDFs. For page 5 in this PDF, I have examined the Token Stream (http://pastebin.com/k6g4BGih) and that shows the last cm
operation before the curve (c)
commands sets the transfomration matrix to COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}
. The full output is at http://pastebin.com/9XaPQQm9 .
Next I used the following set of codes to extract the line and curve commands from the same page following a code @mkl provided in a related SO question
- Main class: http://pastebin.com/htiULanR
Helper classes:
a. Class that extends
PDFGraphicsStreamEngine
: http://pastebin.com/zL2p75hab.
Path
: http://pastebin.com/d3vXCgnCc.
Subpath
: http://pastebin.com/CxunHPiZd.
Segment
: http://pastebin.com/XP1Dby6Ue.
Rectangle
: http://pastebin.com/fNtHNtwsf.
Line
: http://pastebin.com/042cgZBpg.
Curve
: http://pastebin.com/wXbXZdqE
In that code, I printed the CTM using getGraphicsState().getCurrentTransformationMatrix()
inside the curveTo()
method that is overridden from PDFGraphicsStreamEngine
class. That shows the CTM as [0.1,0.0,0.0,0.1,0.0,0.0]
. So my questions are:
Shouldn't these two CTMs be the same?
Both these CTMs have scaling operations: the first one scales with a factor of 10 and the second one scales with a factor of 0.1. If I ignore the scaling, I can create an SVG which looks fairly close to the original PDF. But I am confused why that should happen. Do I need to consider
all transformation matrices before the path
instead of the last one?
First of all: You say
the last
cm
operation before thecurve (c)
commands sets the transfomration matrix toCOSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}
.
This is not correct, cm does not set the transformation matrix to the parameter values but it multiplies the matrix parameter and the former current transformation matrix and sets the result as the new current transformation matrix, a process also called concatenation. Thus:
- Shouldn't these two CTMs be the same?
No, because cm doesn't set, it concatenates!
Furthermore, the current transformation matrix (and all other graphics state values!) is not only changed by the explicit setter or concatenator instructions but also the restore-state instruction which you ignore currently. Thus:
- Do I need to consider all transformation matrices before the path instead of the last one?
You may have to consider more than the last, but only those not undone by graphics state restoration.
Let's look at your example document...
When you want to keep track of the current transformation matrix, you have to inspect both the cm and the q/Q instructions. In case of your page 5 the content stream with focus on those instructions up to the first c curve instruction looks like this:
q 0.1 0 0 0.1 0 0 cm
q
q 10 0 0 10 0 0 cm BT
[...large text object...]
ET Q
Q
q
[...clip path definition...]
q 10 0 0 10 0 0 cm BT
[...small text object...]
ET Q
Q
q
[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c
Assuming a starting identity transformation matrix this implies the following flow of currently current transformation matrix and the current transformation matrices in the graphics stack:
CTM: 1 0 0 1 0 0
Stack: empty
q
CTM: 1 0 0 1 0 0
Stack: 1 0 0 1 0 0
0.1 0 0 0.1 0 0 cm
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0
10 0 0 10 0 0 cm
CTM: 1 0 0 1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0
BT
[...large text object...]
ET Q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
Q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
[...clip path definition...]
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0
10 0 0 10 0 0 cm
CTM: 1 0 0 1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0
BT
[...small text object...]
ET Q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
Q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0
q
CTM: 0.1 0 0 0.1 0 0
Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0
[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c
Thus, PDFBox is correct when you observe:
I printed the CTM using
getGraphicsState().getCurrentTransformationMatrix()
inside thecurveTo()
method that is overridden fromPDFGraphicsStreamEngine
class. That shows the CTM as[0.1,0.0,0.0,0.1,0.0,0.0]
这篇关于关于PDF中当前变换矩阵的困惑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!