PDF Tj命令带有尖括号? [英] PDF Tj command with angle brackets?
问题描述
我正在尝试找出未压缩 PDF v1.4文档中使用Times字体的位置.
I'm trying to figure out where in an uncompressed PDF v1.4 document the Times font is used.
在PDF中描述Times字体的/Font
对象是对象65
,如下所示:
The /Font
object describing the Times font within the PDF is object 65
as follows:
65 0 obj
<</Type /Font
/Subtype /TrueType
/BaseFont /PXAAAD+TimesNewRoman,Italic
/FirstChar 1
/LastChar 35
/Widths [250 333 333 333 500 500 500 500 500 500 500 500 500 500 333 722 722 833 666 610 500 556 500 443 443 500 277 443 500 389 389 277 500 443 500]
/FontDescriptor 205 0 R
/ToUnicode 206 0 R>>
endobj
它引用/FontDescriptor
对象205
以进一步定义Times字体对象,并引用对象206
中的/ToUnicode
映射,该映射描述字节到Unicode字符映射. 编辑:在 Ritsaert 对以下问题的初步回答之后,我要添加字体的/ToUnicode
对象,以提供提到的CMap
.
It refers to a /FontDescriptor
object 205
to further define the Times font object, and to a /ToUnicode
map in object 206
which describes byte-to-unicode character mapping. After Ritsaert's initial answer to the question below, I'm adding the font's /ToUnicode
object here, to provide the mentioned CMap
.
206 0 obj
<</Length 208 0 R>>
stream
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<00> <FF>
endcodespacerange
35 beginbfchar
<01> <0020>
<02> <0028>
<03> <0029>
<04> <002d>
<05> <0030>
<06> <0031>
<07> <0032>
...
<23> <0101>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end
endstream
endobj
我现在已经将Times字体对象的使用跟踪到了/Page
对象(许多对象中的一个),如下例所示,该对象通过页面65
. >:
I've now tracked down the use of the Times font object to a /Page
object (one of many) like the following one which refers to font object 65
through the /F4
reference in its page /Resources
:
12 0 obj
<</Type /Page
/Parent 2 0 R
/MediaBox [0 0 432 648]
/Contents 92 0 R
/Resources <</Font <</F1 62 0 R
/F3 64 0 R
/F4 65 0 R>>
/ProcSet [/PDF /Text]>>
/Group <</S /Transparency
/CS /DeviceRGB>>>>
endobj
/Contents
流(PDF文件中的对象92
)将充满文本对象(包含在BT
和ET
中),它们都不包含文本,而是使用尖括号将数字.例如,这是对Times字体/F4
的唯一引用,我正在尝试查找其用途:
The /Contents
stream (object 92
in the PDF file) is then full of text objects (enclosed in BT
and ET
), none of which contains text, but instead they use angle brackets full of numbers. For example, here is the only reference to the Times font /F4
whose use I'm trying to find:
92 0 obj
<</Length 93 0 R>>
stream
...
BT
0.5020 g
72.0000 615.1512 Td
/F4 12.0000 Tf
<0605> Tj
ET
...
endstream
endobj
但是尖括号和数字<0605>
是什么意思?字体表中的特定字形?查看 PDF参考和第5.3.2节,我可以没有找到尖括号.
But what do the angle brackets and the number <0605>
refer to? A specific glyph in the font table? Looking at the PDF reference and section 5.3.2 I can't find mention of the angle brackets.
给定以上代码和公认的答案,即<0605>
是文本的十六进制编码,<0605>
是CMap
中的<06>
和<05>
条目对象206
,因此分别映射到unicodes <0031>
和<0030>
.这就是说,字符串<0605>
引用了U + 0031(一个"1"),并引用了U + 0030(一个"0"),这样,Times字体用于页面对象12
上的字符串"10"
Given the above code and the accepted answer that <0605>
is a hex encoding of text, the <0605>
are the entries <06>
and <05>
in the CMap
object 206
and thus map to unicodes <0031>
and <0030>
respectively. That means, the string <0605>
refers to U+0031 (a "1") and to U+0030 (a "0"), such that the Times font is used for the string "10" on page object 12
.
推荐答案
这是怎么回事:
-
在内容流中的
-
中,为
Tj
命令提供了要绘制的字符串<0605>
.<>
之间的字符串是 hex 字符串,因此将绘制字符#6和#5.链接的PDF参考3.2.3中解释了该符号.
in the content stream the
Tj
command is given the string<0605>
to draw. a string in between<>
is a hex string and hence the characters #6 and #5 are drawn. In 3.2.3 of the linked PDF reference is the notation explained.
就在文本绘制命令之前,使用Tf
命令选择了字体F4
.
Just before the text draw command the font F4
is selected using the Tf
command.
给定包含字体的页面的资源派生引用为对象65修订版0.此字体对象是其中定义了字形1..35的子集Truetype字体.未指定Encoding
(因此使用了WinAnsiEncoding
).因此,嵌入的子集字体以一种非标准的方式(经常发生)重新排列了字体中的字符.
Given the resource fork of the page containing the font is referenced as object 65 revision 0. This font object is a subsetted Truetype font where glyphs 1..35 are defined. No Encoding
is specified (thus WinAnsiEncoding
is used). So the embedded subsetted font rearranged the characters in the font in a non standard manner (occurs quite often).
现在,如果您想知道这些字形ID如何链接到Unicode字符:字体具有ToUnicode
链接,其中流包含定义映射的CMAP
.这足以将字符串转换为Unicode字符串.
Now if you want to know how these glyph IDs are linked to Unicode characters: the font has a ToUnicode
link where a stream contains a CMAP
defining the mapping. This should be sufficient to convert the string to an Unicode string.
这篇关于PDF Tj命令带有尖括号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!