/差异字典，用于PDF中的编码解析问题 [英] /Differences dictionary for encode parsing issue in PDF

查看：116 发布时间：2020/5/25 5:01:32 pdf embedded-fonts

本文介绍了/差异字典，用于PDF中的编码解析问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Type1字体/Differences编码在值的映射中使用字符串，例如1个字符被编码为'one'.它仅用于数字和特殊字符.

使用这些编码的标准方法是什么?

我应该如何从使用这种编码的PDF解码字符串?

文件链接: http://www.filedropper.com/open

解决方案

这是文件中的/Differences数组(老实说，您应该发布此文件，而不是链接到skeevy下载页面):

/Differences [
    24 /breve/caron/circumflex/dotaccent/hungarumlaut/ogonek/ring/tilde
    39 /quotesingle
    96 /grave
    128 /bullet/dagger/daggerdbl/ellipsis...
]

这种工作方式是字体还具有与其关联的编码(例如/MacRoman或/WinANSI).对于Type 1字体，该字体内置了一种编码.然后，给定该编码的副本，则将差异应用于该编码.从数字开始(您的第一个是24)，将条目24-31(包括首尾)更改为/breve，/circumflex等.

在Type 1字体中，有一个名为/CharStrings的字典，该字典将字形的名称与将呈现它的数据/代码相关联.例如，如果获得一个代码为26的字符，则在编码数组(对于Type 1字体应为256个元素的数组)中查找该字符，并应用差异，您将获得名称/circumflex.然后，您可以在CharStrings词典中查找该字形，并提取字形数据并进行渲染.编码中不存在的任何字符都应设置为/.notdef，然后将呈现表示未定义字符的形状(通常是一个空框).

现在您可能遇到的问题是，如何将这些字形名称转换为更有用的名称，例如Unicode?

如果您查看附件D，则会看到一组表，这些表定义了标准拉丁编码的字符集.您将创建一个查找表，该表将Adobe标准名称映射到Unicode.不幸的是，附件D中的表格不完整.幸运的是，Adobe在此处.该文件中有一个链接，该链接现在已失效，但很可能是要在此处链接.

Type1 font /Differences encoding uses strings in mapping of values for example 1 character is encoded to 'one'. It is used for numbers and special characters only.

What is the standard way to use these encoding?

How should I decode string from PDF which uses such encoding?

Link for the file: http://www.filedropper.com/open

解决方案

Here's the /Differences array in your file (and honestly, you should have just posted this and not a link a skeevy download page):

/Differences [
    24 /breve/caron/circumflex/dotaccent/hungarumlaut/ogonek/ring/tilde
    39 /quotesingle
    96 /grave
    128 /bullet/dagger/daggerdbl/ellipsis...
]

The way this works is that the font also has an encoding associated with it (for example /MacRoman or /WinANSI). In the case of a Type 1 font, there is an encoding built into the font. Then given a copy of that encoding, you apply the differences to it. Start from the number (your first is 24), you change entries 24-31 inclusive to /breve, /circumflex and so on.

In Type 1 fonts, there is a dictionary called /CharStrings, which an association of a name of a glyph with the data/code that will render it. If, for example, you get a character with code 26, you look it up in your encoding array (which should be a 256 element array for Type 1 fonts) and with the differences applied, you get the name /circumflex. You then look that up in the CharStrings dictionary, pull out the glyph data and render it. Any character that does not exist in the encoding should be set to /.notdef which will then render an shape representing an undefined character (usually an empty box).

Now likely your problem is, how do I turn these glyph names in something that is more useful like, say Unicode?

If you look in Annex D, you'll see a set of tables that define the character sets for standard Latin encodings. You would make a lookup table that maps Adobe standard names to Unicode. Unfortunately, the tables in Annex D are incomplete. Fortunately, Adobe has a file that defines all of that for you here. There is a link in that file which is now dead, but most likely it was meant to go here.

这篇关于/差异字典，用于PDF中的编码解析问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

/差异字典，用于PDF中的编码解析问题 [英] /Differences dictionary for encode parsing issue in PDF

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

/差异字典，用于PDF中的编码解析问题 [英] /Differences dictionary for encode parsing issue in PDF

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭