使用PyPDF2检测Google Docs生成的PDF文件中的非嵌入字体 [英] Use PyPDF2 to detect non-embedded fonts in PDF file generated by Google Docs

查看：0 发布时间：2022/7/19 14:21:28 python pdf fonts google-docs pypdf2

本文介绍了使用PyPDF2检测Google Docs生成的PDF文件中的非嵌入字体的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我希望有人能帮我编写一个Python函数来检测文件中没有嵌入到文件中的任何字体。我尝试使用here链接的脚本，它可以检测文档字体，但不能检测嵌入的字体。为方便起见，我粘贴了以下脚本：

from PyPDF2 import PdfFileReader
import sys

fontkeys = set(['/FontFile', '/FontFile2', '/FontFile3'])

def walk(obj, fnt, emb):
    if '/BaseFont' in obj:
        fnt.add(obj['/BaseFont'])

    elif '/FontName' in obj and fontkeys.intersection(set(obj)):
        emb.add(obj['/FontName'])

    for k in obj:
        if hasattr(obj[k], 'keys'):
            walk(obj[k], fnt, emb)

    return fnt, emb

if __name__ == '__main__':
    fname = sys.argv[1]
    pdf = PdfFileReader(fname)
    fonts = set()
    embedded = set()

    for page in pdf.pages:
        obj = page.getObject()
        f, e = walk(obj['/Resources'], fonts, embedded)
        fonts = fonts.union(f)
        embedded = embedded.union(e)

    unembedded = fonts - embedded
    print 'Font List'
    pprint(sorted(list(fonts)))
    if unembedded:
        print '
Unembedded Fonts'
        pprint(unembedded)

例如，我从Google Docs(输入一些内容，另存为PDF)下载了一个带有Arial字体的PDF，Adobe Reader已经确认该字体是嵌入的。但是，该脚本返回[‘/ArialMT’]作为字体，并返回嵌入字体的空集。此外，看起来没有任何递归对象具有键{'/FontFile', '/FontFile2', '/FontFile3'}。我已经在其他PDF上试过了，它也很管用，所以Google Docs的PDF一定有什么奇怪的地方。让我知道我可以为这个PDF文件提供哪些其他调试信息。

我认为有可能Google Docs只嵌入了14种标准PDF字体中没有的字体。然而，我尝试了一种奇怪的字体(Pacsigno)，脚本还指出这种字体没有嵌入，而Adobe声称它是嵌入的。

我使用this PDF进行了尝试，脚本正确地指出这14种字体已嵌入。

推荐答案

问题是此脚本不处理列表。例如，在Google Docs示例中，在PDF对象中，您可以看到以下结构：

{'/Encoding': '/Identity-H', '/Type': '/Font', '/BaseFont': '/Pacifico-Regular', '/ToUnicode': IndirectObject(9, 0), '/DescendantFonts': [IndirectObject(16, 0)], '/Subtype': '/Type0'}

键DescendantFonts映射到一个值列表，如果您更深入地递归，它将包含字体文件的键。您还必须修改脚本以测试数组，例如：

if type(obj) == PyPDF2.generic.ArrayObject:  # You can also do ducktyping here
    for i in obj:
        if hasattr(i, 'keys'):
            walk(i, all_fonts, embedded_fonts)

这篇关于使用PyPDF2检测Google Docs生成的PDF文件中的非嵌入字体的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用PyPDF2检测Google Docs生成的PDF文件中的非嵌入字体 [英] Use PyPDF2 to detect non-embedded fonts in PDF file generated by Google Docs

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用PyPDF2检测Google Docs生成的PDF文件中的非嵌入字体 [英] Use PyPDF2 to detect non-embedded fonts in PDF file generated by Google Docs

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭