PDF中的嵌入式字体:复制和粘贴问题 [英] Embedded fonts in PDF: copy and paste problems

查看:544
本文介绍了PDF中的嵌入式字体:复制和粘贴问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当尝试从嵌入了某些字体集的PDF文档中复制并粘贴到MS Word文档中时,结果难以辨认.

When trying to copy and paste into a MS word document from a PDF document which has some sets of fonts embedded, the result is illegible.

几个符号被更改甚至消失.

Several symbols are changed or even disappear.

使用Adobe Acrobat,我可以检查嵌入了哪些特定字体.

Using Adobe Acrobat I can check which specific fonts are embedded.

  • 在Microsoft Word中安装这样的字体能解决问题吗?
  • 如果是,我在哪里可以获得甚至创建所需字体的那些子集?
  • 如果没有,我该如何解决这个问题?

推荐答案

您应该首先在pdffonts实用程序的帮助下检查PDF文档的字体.这是 适用于Windows的XPDF程序包 的一部分,并且可以无需安装即可使用,只需在DOS盒中即可.

You should check your PDF document's fonts first with the help of the pdffonts utility. That is part of the XPDF package for Windows and can be used without installing, just from a DOS box.

为了成功地从PDF中提取文本(或复制并粘贴),字体应使用 standard 编码(而不是Custom),并且应具有PDF内与之关联的/ToUnicode表.

In order to successfully extract text (or copy'n'paste it) from a PDF, the font should either use a standard encoding (not a Custom one), and it should have a /ToUnicode table associated with it inside the PDF.

pdffonts返回有关PDF使用的字体的一些基本信息.

pdffonts returns a few basic information items about the fonts used by your PDF.

示例输出:

$ pdffonts -f 3 -l 5 sample.pdf
  name                      type          encoding     emb sub uni object ID
  ------------------------- ------------- ------------ --- --- --- ---------
  IADKRB+Arial-BoldMT       CID TrueType  Identity-H   yes yes yes     10  0
  SSKFGJ+ArialMT            CID TrueType  Custom       yes yes no      11  0

上面的命令询问在页面范围 3 (首先检查)到 5 (最后检查的页面)中使用的字体.

The command above asked for the fonts used in the page range 3 (first to check) to 5 (last page to check).

在上述情况下,两种使用的字体都嵌入为子集(由名称的XYZABC+前缀以及 yes指示) > sub 列).

In the above case, both used fonts are embedded as subsets (indicated by the XYZABC+-prefixes to their names, as well as by the yes in the emb and the sub columns).

字体SSKFGJ+ArialMT使用自定义编码,但是PDF没有该字体的/ToUnicode,如标题为 uni 的列的no条目所示.

The font SSKFGJ+ArialMT uses a custom encoding, but the PDF has no /ToUnicode for this font, as indicated by the no entry for the column headed uni.

因此,提取使用该字体显示的文本并不容易(提取需要手动进行反向工程-但您也可以只阅读" PDF页面).

Hence it is not easy to extract text that is shown with this font (extraction would require manual reverse engineering -- but then you can also just "read" the PDF pages).

如果您使用简单的文本文件(而不是MS Word文档)作为目标,则应首先检查是否可以进行文本复制粘贴.如果不是这样,您已经可以忘记MS Word ...

You should check first, if copy'n'pasting of text works if you use a simple text file as a target (not an MS Word document). If it doesn't, you can already forget about MS Word...

  • 在Microsoft Word中安装这样的字体能解决问题吗?
  • Would installing such fonts in Microsoft Word work it out?

  • 很有可能:. (我不能给出明确的答案,除非自己可以访问相关的PDF.)
    • Very likely: no. (I cannot give a definite answer without having myself access to the PDF in question.)
      • 如果是这样,我可以在哪里获得甚至创建所需字体的那些子集?
      • If so, where can I get or even create those subsets of the fonts I need?

      • 您可以从PDF本身提取子集的字体. (有趣的是, 我最受欢迎的StackOverflow答案 恰好解决了这个问题-我不知道为什么人们似乎为从PDF文件中提取字体而不是出于调试目的而如此疯狂……)
        • You could extract the subsetted fonts from the PDF itself. (Funnily, my most popular StackOverflow answer deals with exactly that question -- I dunno why people seem to be so crazy about extracting fonts from PDF files other than for debugging purposes...)
          • 如果没有,我该如何解决这个问题?
          • If not, how could I solve this problem?

          • 除了手动执行此操作外,没有其他解决方案.
          • 不幸的是,您不能通过Acrobat或Adobe Reader获得有关PDF使用的字体的完全相同的信息.您通过菜单->文件->属性... 可以 获得的

            You can, unfortunately, not get the exactly same info about the fonts used by a PDF via Acrobat or Adobe Reader. What you can get via Menu -> File -> Properties... is

            • 字体名称
            • 子集信息(但不包括用于子集字体名称的前缀)
            • 编码和
            • 字体类型.

            但是您没有获得有关/ToUnicode表是否存在的信息.

            But you do not get the info about the presence of a /ToUnicode table.

            这篇关于PDF中的嵌入式字体:复制和粘贴问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆