如何将PDF合并到一个文件中而没有相同字体的多个副本? [英] How to merge PDFs to a single file without multiple copies of the same font?

查看:156
本文介绍了如何将PDF合并到一个文件中而没有相同字体的多个副本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建PDF并将它们连接成一个PDF.

生成的PDF文件比我预期的要大得多.

我意识到我的输出PDF具有大量重复的字体,这是文件大小意外大的原因.

在这里,我的问题是:

我想创建仅嵌入字体信息的PDF,因此让他们使用Windows系统字体.

将它们合并为一个PDF时,会插入PDF需要的实际字体.

如果可能的话,请让我知道该怎么做.

解决方案

我创建了 MergeAndAddFont 示例以说明不同的选项.

我们将使用以下代码片段创建PDF:

public void createPdf(String filename, String text, boolean embedded, boolean subset) throws DocumentException, IOException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter.getInstance(document, new FileOutputStream(filename));
    // step 3
    document.open();
    // step 4
    BaseFont bf = BaseFont.createFont(FONT, BaseFont.WINANSI, embedded);
    bf.setSubset(subset);
    Font font = new Font(bf, 12);
    document.add(new Paragraph(text, font));
    // step 5
    document.close();
}

我们使用此代码创建3个测试文件(1、2、3),并将执行3次:A,B,C.

第一次,我们使用参数embedded = truesubset = true,生成文件(3.71 KB), testA2.pdf ,文本为"ijklmnopq"(3.49 KB)和 testA3.pdf ,文本为"rstuvwxyz"(3.55 KB).字体是嵌入的,并且文件大小相对较小,因为我们仅嵌入了字体的子集.

现在,使用smart参数使用以下代码合并这些文件,以表明我们要使用PdfCopy还是PdfSmartCopy:

public void mergeFiles(String[] files, String result, boolean smart) throws IOException, DocumentException {
    Document document = new Document();
    PdfCopy copy;
    if (smart)
        copy = new PdfSmartCopy(document, new FileOutputStream(result));
    else
        copy = new PdfCopy(document, new FileOutputStream(result));
    document.open();
    PdfReader[] reader = new PdfReader[3];
    for (int i = 0; i < files.length; i++) {
        reader[i] = new PdfReader(files[i]);
        copy.addDocument(reader[i]);
    }
    document.close();
    for (int i = 0; i < reader.length; i++) {
        reader[i].close();
    }
}

当我们合并文档时,无论是使用PdfCopy还是PdfSmartCopy,相同字体的不同子集都将作为单独的对象复制到生成的PDF testB2.pdf (21.38 KB)和 testA3.pdf (21.38 KB).字体完全嵌入,单个文件的文件大小比以前大很多,因为嵌入了完整字体.

如果我们使用PdfCopy合并文件,则字体将多余地出现在合并文档中,从而导致文件过大 testB_merged1.pdf (63.16 KB).这绝对不是您想要的!

但是,如果我们使用PdfSmartCopy,则iText将检测到相同的字体流并重新使用它,从而导致 testC2.pdf (2.04 KB)和 testC3.pdf (2.04 KB).该字体未嵌入,因此具有很好的文件大小,但是如果与以前的结果之一进行比较,则会发现该字体看起来完全不同.

我们使用PdfSmartCopy合并文件,从而生成 testC_merged1.pdf (2.6 KB).再次,我们有一个很好的文件大小,但是再次,我们遇到了字体无法正确显示的问题.

要解决此问题,我们需要嵌入字体:

private void embedFont(String merged, String fontfile, String result) throws IOException, DocumentException {
    // the font file
    RandomAccessFile raf = new RandomAccessFile(fontfile, "r");
    byte fontbytes[] = new byte[(int)raf.length()];
    raf.readFully(fontbytes);
    raf.close();
    // create a new stream for the font file
    PdfStream stream = new PdfStream(fontbytes);
    stream.flateCompress();
    stream.put(PdfName.LENGTH1, new PdfNumber(fontbytes.length));
    // create a reader object
    PdfReader reader = new PdfReader(merged);
    int n = reader.getXrefSize();
    PdfObject object;
    PdfDictionary font;
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(result));
    PdfName fontname = new PdfName(BaseFont.createFont(fontfile, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED).getPostscriptFontName());
    for (int i = 0; i < n; i++) {
        object = reader.getPdfObject(i);
        if (object == null || !object.isDictionary())
            continue;
        font = (PdfDictionary)object;
        if (PdfName.FONTDESCRIPTOR.equals(font.get(PdfName.TYPE))
            && fontname.equals(font.get(PdfName.FONTNAME))) {
            PdfIndirectObject objref = stamper.getWriter().addToBody(stream);
            font.put(PdfName.FONTFILE2, objref.getIndirectReference());
        }
    }
    stamper.close();
    reader.close();
}

现在,我们有了文件 testC_merged2.pdf (22.03 KB)和这实际上是您问题的答案.如您所见,第二个选项比第三个选项要好.

注意事项:本示例使用Gravitas One字体作为简单字体.一旦将字体用作复合字体(通过选择编码IDENTITY-HIDENTITY-V告诉iText将其用作复合字体),就无法再选择嵌入字体,是否对字体进行子集化.根据ISO-32000-1的定义,iText将始终嵌入复合字体并将其子集化.

这意味着当您需要特殊字体(中文,日文,韩文)时,不能使用上述解决方案.在这种情况下,您不应嵌入字体,而应使用所谓的CJK字体.他们的CJK字体将使用Adobe Reader可以下载的字体包.

I create PDFs and concatenate them into a single PDF.

My resulting PDF is a lot bigger than I had expected in file size.

I realised that my output PDF has a ton of duplicate font, and it is the reason of unexpectedly big file size.

Here, my question is:

I would like to create PDFs which only embed font information, so let they use Windows System Font.

When I merge them into a single PDF, I insert actual font which PDF needs.

If possible, please let me know how to do it.

解决方案

I've created the MergeAndAddFont example to explain the different options.

We'll create PDFs using this code snippet:

public void createPdf(String filename, String text, boolean embedded, boolean subset) throws DocumentException, IOException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter.getInstance(document, new FileOutputStream(filename));
    // step 3
    document.open();
    // step 4
    BaseFont bf = BaseFont.createFont(FONT, BaseFont.WINANSI, embedded);
    bf.setSubset(subset);
    Font font = new Font(bf, 12);
    document.add(new Paragraph(text, font));
    // step 5
    document.close();
}

We use this code to create 3 test files, 1, 2, 3 and we'll do this 3 times: A, B, C.

The first time, we use the parameters embedded = true and subset = true, resulting in the files testA1.pdf with text "abcdefgh" (3.71 KB), testA2.pdf with text "ijklmnopq" (3.49 KB) and testA3.pdf with text "rstuvwxyz" (3.55 KB). The font is embedded and the file size is relatively low because we only embed a subset of the font.

Now we merge these files using the following code, using the smart parameter to indicate whether we want to use PdfCopy or PdfSmartCopy:

public void mergeFiles(String[] files, String result, boolean smart) throws IOException, DocumentException {
    Document document = new Document();
    PdfCopy copy;
    if (smart)
        copy = new PdfSmartCopy(document, new FileOutputStream(result));
    else
        copy = new PdfCopy(document, new FileOutputStream(result));
    document.open();
    PdfReader[] reader = new PdfReader[3];
    for (int i = 0; i < files.length; i++) {
        reader[i] = new PdfReader(files[i]);
        copy.addDocument(reader[i]);
    }
    document.close();
    for (int i = 0; i < reader.length; i++) {
        reader[i].close();
    }
}

When we merge the document, be it with PdfCopy or PdfSmartCopy, the different subsets of the same font will be copied as separate objects in the resulting PDF testA_merged1.pdf / testA_merged2.pdf (both 9.75 KB).

This is the problem you are experiencing: PdfSmartCopy can detect and reuse identical objects, but the different subsets of the same font aren't identical and iText can't merge different subsets of the same font into one font.

The second time, we use the parameters embedded = true and subset = false, resulting in the files testB1.pdf (21.38 KB), testB2.pdf (21.38 KB) and testA3.pdf (21.38 KB). The font is fully embedded and the file size of a single file is a lot bigger than before because the full font is embedded.

If we merge the files using PdfCopy, the font will be present in the merged document redundantly, resulting in the bloated file testB_merged1.pdf (63.16 KB). This is definitely not what you want!

However, if we use PdfSmartCopy, iText detects an identical font stream and reuses it, resulting in testB_merged2.pdf (21.95 KB) which is much smaller than we had with PdfCopy. It's still bigger than the document with the subsetted fonts, but if you're concatenating a huge amount of files, the result will be better if you embed the complete font.

The third time, we use the parameters embedded = false and subset = false, resulting in the files testC1.pdf (2.04 KB), testC2.pdf (2.04 KB) and testC3.pdf (2.04 KB). The font isn't embedded, resulting in an excellent file size, but if you compare with one of the previous results, you'll see that the font looks completely different.

We merge the files using PdfSmartCopy, resulting in testC_merged1.pdf (2.6 KB). Again, we have an excellent file size, but again we have the problem that the font isn't visualized correctly.

To fix this, we need to embed the font:

private void embedFont(String merged, String fontfile, String result) throws IOException, DocumentException {
    // the font file
    RandomAccessFile raf = new RandomAccessFile(fontfile, "r");
    byte fontbytes[] = new byte[(int)raf.length()];
    raf.readFully(fontbytes);
    raf.close();
    // create a new stream for the font file
    PdfStream stream = new PdfStream(fontbytes);
    stream.flateCompress();
    stream.put(PdfName.LENGTH1, new PdfNumber(fontbytes.length));
    // create a reader object
    PdfReader reader = new PdfReader(merged);
    int n = reader.getXrefSize();
    PdfObject object;
    PdfDictionary font;
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(result));
    PdfName fontname = new PdfName(BaseFont.createFont(fontfile, BaseFont.WINANSI, BaseFont.NOT_EMBEDDED).getPostscriptFontName());
    for (int i = 0; i < n; i++) {
        object = reader.getPdfObject(i);
        if (object == null || !object.isDictionary())
            continue;
        font = (PdfDictionary)object;
        if (PdfName.FONTDESCRIPTOR.equals(font.get(PdfName.TYPE))
            && fontname.equals(font.get(PdfName.FONTNAME))) {
            PdfIndirectObject objref = stamper.getWriter().addToBody(stream);
            font.put(PdfName.FONTFILE2, objref.getIndirectReference());
        }
    }
    stamper.close();
    reader.close();
}

Now, we have the file testC_merged2.pdf (22.03 KB) and that's actually the answer to your question. As you can see, the second option is better than this third option.

Caveats: This example uses the Gravitas One font as a simple font. As soon as you use the font as a composite font (you tell iText to use it as a composite font by choosing the encoding IDENTITY-H or IDENTITY-V), you can no longer choose whether or not to embed the font, whether or not to subset the font. As defined in ISO-32000-1, iText will always embed composite fonts and will always subset them.

This means that you can't use the above solutions when you need special fonts (Chinese, Japanese, Korean). In that case, you shouldn't embed the fonts, but use so-called CJK fonts. They CJK fonts will use font packs that can be downloaded by Adobe Reader.

这篇关于如何将PDF合并到一个文件中而没有相同字体的多个副本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆