为什么pdf只包含一个字段大约是500Kb [英] Why pdf contain one field only is around 500Kb

查看:72
本文介绍了为什么pdf只包含一个字段大约是500Kb的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里您可以

解决方案

acroform 默认资源中有一个内嵌的Arial"字体,见Root/AcroForm/DR/Font/Arial/FontDescriptor/FontFile2.

无论是您还是创建 pdf 的人都无缘无故地添加了它.未使用/引用该字体.对于 acroform 默认资源,您可以检查每个字段的/DA 条目(默认外观)是否包含字体名称.

当您以某种方式删除字段时,您也从 acroForm 默认资源中删除了字体.(你没有写你如何删除它)

这里有一些代码来做到这一点(空检查大多缺失):

 PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();PDResources defaultResources = acroForm.getDefaultResources();COSDictionary fontDict = (COSDictionary) defaultResources.getCOSObject().getDictionaryObject(COSName.FONT);列表<字符串>defaultAppearances = new ArrayList<>();列表fontDeletionList = new ArrayList<>();for (PDField 字段:acroForm.getFieldTree()){如果(PDVariableText 的字段实例){PDVariableText vtField = (PDVariableText) 字段;defaultAppearances.add(vtField.getDefaultAppearance());}}for (COSName fontName : defaultResources.getFontNames()){if (COSName.HELV.equals(fontName) || COSName.ZA_DB.equals(fontName)){//Adob​​e 默认,始终保持继续;}布尔值发现 = 假;for (String da : defaultAppearances){if (da != null && da.contains("/" + fontName.getName())){发现 = 真;休息;}}System.out.println(fontName + ": " + found);如果(!找到){fontDeletionList.add(fontName);}}System.out.println("删除列表:" + fontDeletionList);for (COSName fontName : fontDeletionList){fontDict.removeItem(fontName);}

生成的文件现在有 5KB 大小.

我没有检查注释.其中一些还有一个/DA 字符串,但不清楚在重建丢失的外观流时是否要使用 acroform 默认资源字体.

更新:下面是一些用 Helv 替换 Arial 的额外代码:

for (PDField 字段:acroForm.getFieldTree()){如果(PDVariableText 的字段实例){PDVariableText vtField = (PDVariableText) 字段;String defaultAppearance = vtField.getDefaultAppearance();if (defaultAppearance.startsWith("/Arial")){vtField.setDefaultAppearance("/Helv " + defaultAppearance.substring(7));vtField.getWidgets().get(0).setAppearance(null);//这将删除字体使用vtField.setValue(vtField.getValueAsString());}defaultAppearances.add(vtField.getDefaultAppearance());}}

请注意,这可能不是一个好主意,因为标准的 14 种字体只有有限的字符.试试

vtField.setValue("Ayşe");

你会得到一个例外.

可以在这个答案.

Here you can download pdf with one acroform field and his size is exactly 427Kb

If I remove this unique field, file is 3Kb only, why this happens please ? I tried analyse using PDF Debugger and nothing seems weird to me.

解决方案

There's an embedded "Arial" font in the acroform default resources, see Root/AcroForm/DR/Font/Arial/FontDescriptor/FontFile2.

Either you or whoever created the pdf added it for no reason. The font is not used / referenced. For the acroform default resources you could check the /DA entry (default appearance) of each field whether it contains the font name.

When you removed the field somehow you also removed the font from the acroForm default resources. (You didn't write how you removed it)

Here's some code to do it (null checks mostly missing):

    PDAcroForm acroForm = doc.getDocumentCatalog().getAcroForm();
    PDResources defaultResources = acroForm.getDefaultResources();
    COSDictionary fontDict = (COSDictionary) defaultResources.getCOSObject().getDictionaryObject(COSName.FONT);
    List<String> defaultAppearances = new ArrayList<>();
    List<COSName> fontDeletionList = new ArrayList<>();
    for (PDField field : acroForm.getFieldTree())
    {
        if (field instanceof PDVariableText)
        {
            PDVariableText vtField = (PDVariableText) field;
            defaultAppearances.add(vtField.getDefaultAppearance());
        }
    }
    for (COSName fontName : defaultResources.getFontNames())
    {
        if (COSName.HELV.equals(fontName) || COSName.ZA_DB.equals(fontName))
        {
            // Adobe default, always keep
            continue;
        }
        boolean found = false;
        for (String da : defaultAppearances)
        {
            if (da != null && da.contains("/" + fontName.getName()))
            {
                found = true;
                break;
            }
        }
        System.out.println(fontName + ": " + found);
        if (!found)
        {
            fontDeletionList.add(fontName);
        }
    }
    System.out.println("deletion list: " + fontDeletionList);
    for (COSName fontName : fontDeletionList)
    {
        fontDict.removeItem(fontName);
    }

The resulting file has 5KB size now.

I haven't checked the annotations. Some of them have also a /DA string but it is unclear if the acroform default resources fonts are to be used when reconstructing a missing appearance stream.

Update: Here's some additional code to replace Arial with Helv:

for (PDField field : acroForm.getFieldTree())
{
    if (field instanceof PDVariableText)
    {
        PDVariableText vtField = (PDVariableText) field;
        String defaultAppearance = vtField.getDefaultAppearance();
        if (defaultAppearance.startsWith("/Arial"))
        {
            vtField.setDefaultAppearance("/Helv " + defaultAppearance.substring(7));
            vtField.getWidgets().get(0).setAppearance(null); // this removes the font usage
            vtField.setValue(vtField.getValueAsString());
        }
        defaultAppearances.add(vtField.getDefaultAppearance());
    }
}

Note that this may not be a good idea, because the standard 14 fonts have only limited characters. Try

vtField.setValue("Ayşe");

and you'll get an exception.

More general code to replace font can be found in this answer.

这篇关于为什么pdf只包含一个字段大约是500Kb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆