PDF Tj命令带有尖括号? [英] PDF Tj command with angle brackets?

查看:278
本文介绍了PDF Tj命令带有尖括号?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找出未压缩 PDF v1.4文档中使用Times字体的位置.

I'm trying to figure out where in an uncompressed PDF v1.4 document the Times font is used.

在PDF中描述Times字体的/Font对象是对象65,如下所示:

The /Font object describing the Times font within the PDF is object 65 as follows:

65 0 obj
<</Type /Font
/Subtype /TrueType
/BaseFont /PXAAAD+TimesNewRoman,Italic
/FirstChar 1
/LastChar 35
/Widths [250 333 333 333 500 500 500 500 500 500 500 500 500 500 333 722 722 833 666 610 500 556 500 443 443 500 277 443 500 389 389 277 500 443 500]
/FontDescriptor 205 0 R
/ToUnicode 206 0 R>>
endobj

它引用/FontDescriptor对象205以进一步定义Times字体对象,并引用对象206中的/ToUnicode映射,该映射描述字节到Unicode字符映射. 编辑:在 Ritsaert 对以下问题的初步回答之后,我要添加字体的/ToUnicode对象,以提供提到的CMap.

It refers to a /FontDescriptor object 205 to further define the Times font object, and to a /ToUnicode map in object 206 which describes byte-to-unicode character mapping. After Ritsaert's initial answer to the question below, I'm adding the font's /ToUnicode object here, to provide the mentioned CMap.

206 0 obj
<</Length 208 0 R>>
stream
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<00> <FF>
endcodespacerange
35 beginbfchar
<01> <0020>
<02> <0028>
<03> <0029>
<04> <002d>
<05> <0030>
<06> <0031>
<07> <0032>
...
<23> <0101>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end

endstream
endobj

我现在已经将Times字体对象的使用跟踪到了/Page对象(许多对象中的一个),如下例所示,该对象通过页面/F4引用来引用字体对象65. >:

I've now tracked down the use of the Times font object to a /Page object (one of many) like the following one which refers to font object 65 through the /F4 reference in its page /Resources:

12 0 obj
<</Type /Page
/Parent 2 0 R
/MediaBox [0 0 432 648]
/Contents 92 0 R
/Resources <</Font <</F1 62 0 R
/F3 64 0 R
/F4 65 0 R>>
/ProcSet [/PDF /Text]>>
/Group <</S /Transparency
/CS /DeviceRGB>>>>
endobj

/Contents流(PDF文件中的对象92)将充满文本对象(包含在BTET中),它们都不包含文本,而是使用尖括号将数字.例如,这是对Times字体/F4的唯一引用,我正在尝试查找其用途:

The /Contents stream (object 92 in the PDF file) is then full of text objects (enclosed in BT and ET), none of which contains text, but instead they use angle brackets full of numbers. For example, here is the only reference to the Times font /F4 whose use I'm trying to find:

92 0 obj
<</Length 93 0 R>>
stream
...
BT
0.5020 g
72.0000 615.1512 Td
/F4 12.0000 Tf
<0605> Tj
ET
...
endstream
endobj

但是尖括号和数字<0605>是什么意思?字体表中的特定字形?查看 PDF参考和第5.3.2节,我可以没有找到尖括号.

But what do the angle brackets and the number <0605> refer to? A specific glyph in the font table? Looking at the PDF reference and section 5.3.2 I can't find mention of the angle brackets.

给定以上代码和公认的答案,即<0605>是文本的十六进制编码,<0605>CMap中的<06><05>条目对象206,因此分别映射到unicodes <0031><0030>.这就是说,字符串<0605>引用了U + 0031(一个"1"),并引用了U + 0030(一个"0"),这样,Times字体用于页面对象12上的字符串"10"

Given the above code and the accepted answer that <0605> is a hex encoding of text, the <0605> are the entries <06> and <05> in the CMap object 206 and thus map to unicodes <0031> and <0030> respectively. That means, the string <0605> refers to U+0031 (a "1") and to U+0030 (a "0"), such that the Times font is used for the string "10" on page object 12.

推荐答案

这是怎么回事:

    在内容流中的
  • 中,为Tj命令提供了要绘制的字符串<0605>. <>之间的字符串是 hex 字符串,因此将绘制字符#6和#5.链接的PDF参考3.2.3中解释了该符号.

  • in the content stream the Tj command is given the string <0605> to draw. a string in between <> is a hex string and hence the characters #6 and #5 are drawn. In 3.2.3 of the linked PDF reference is the notation explained.

就在文本绘制命令之前,使用Tf命令选择了字体F4.

Just before the text draw command the font F4 is selected using the Tf command.

给定包含字体的页面的资源派生引用为对象65修订版0.此字体对象是其中定义了字形1..35的子集Truetype字体.未指定Encoding(因此使用了WinAnsiEncoding).因此,嵌入的子集字体以一种非标准的方式(经常发生)重新排列了字体中的字符.

Given the resource fork of the page containing the font is referenced as object 65 revision 0. This font object is a subsetted Truetype font where glyphs 1..35 are defined. No Encoding is specified (thus WinAnsiEncoding is used). So the embedded subsetted font rearranged the characters in the font in a non standard manner (occurs quite often).

现在,如果您想知道这些字形ID如何链接到Unicode字符:字体具有ToUnicode链接,其中流包含定义映射的CMAP.这足以将字符串转换为Unicode字符串.

Now if you want to know how these glyph IDs are linked to Unicode characters: the font has a ToUnicode link where a stream contains a CMAP defining the mapping. This should be sufficient to convert the string to an Unicode string.

这篇关于PDF Tj命令带有尖括号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆