为什么合并PDF会使文件大小膨胀? [英] Why does combining PDFs make filesize balloon?

查看:478
本文介绍了为什么合并PDF会使文件大小膨胀?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将各种PDF剥离在一起.它们不是很沉重的文本,带有偶尔的图像.比如说我有两个PDF,分别为1.4Mb和740kb-当我将它们组合在一起时,它们会膨胀到6Mb!

我已经尝试过脚本组合和手工追加,但结果相同,所以我猜想这是一个潜在的问题.解释为什么会发生这种情况很有用,因此我可以研究避免这种情况的方法.颜色模型不匹配吗?它们的字体很小.

解决方案

您并没有告诉我们如何组合PDF,这使您的问题颇具理论性,因此,我将为您提供理论上的答案:

第1部分

  • 假设您有一个10页的PDF文件,总大小为1200 KB.
  • 假设每个页面的内容流大约由100 KB组成.在此内容流中,有对共享资源的引用.
  • 假设这10个页面共享200 KB的资源:它们共享相同的字体,图像等.

如果您将此PDF爆裂"为10个单独的单页PDF,则每个PDF将包含大约300 KB:内容流中的100 KB +资源中的200 KB(我忽略了拥有10个单独的外部参照表的开销和文件预告片.

  • 如果将这10个单独的单页PDF合并在一起,就好像这10个PDF没有共同之处,则总文件大小将为10 x 300 KB.那是3000 KB,是原始1200 KB的两倍多.
  • 如果结合使用这10个单独的单页PDF,并考虑到它们具有共同的资源(字体,资源等),则总大小将为(10 x 100 KB)+ 200 KB.

如果使用iText合并PDF,则使用PdfCopy将产生3000 KB的PDF,因为PdfCopy只是尽可能快地复制文档而无需查看文档内容.如果需要1200 KB的PDF,则需要使用PdfSmartCopy,在这种情况下,您将需要更多的内存和CPU,因为iText将检查每个PDF并重用本来会多余的对象.

第2部分

在您的问题中,您提到您有一个1.4Mb和740kb的PDF,而1.4Mb + 740kb导致PDF为6Mb.我的理论示例的第一部分没有解释尺寸的极端增长,因此这是第二部分.

  • 在PDF 1.0中,未压缩PDF语法.
  • 从PDF 1.2开始,流被压缩,但是间接对象和交叉引用流以ASCII存储.
  • 从PDF 1.5开始,可以在对象流中压缩一系列对象,并且也可以压缩交叉引用表.

假设原始PDF具有压缩的对象流和压缩的交叉引用表.假设您将这些PDF组合成更像PDF 1.4文档的PDF.在这种情况下,压缩的对象和压缩的交叉引用流将不再被压缩,从而导致更大的文件大小.

第3部分?

取决于原始PDF的性质以及用于组合PDF的工具,可能还有其他原因.您应澄清以上条件是否均不适用.

I'm attempting to strip together various PDFs. They're not that text heavy, with the occasional image. Say for example I have two PDFs, 1.4Mb and 740kb - when I combine them they balloon to 6Mb!

I've tried scripted combination, and hand appending, with the same result, so I'm guessing it's an underlying issue. Some explanation of why it happens would be useful, so I can look at ways of avoiding it. Is it a mismatch in colour models? They fonts are minimal.

解决方案

You aren't telling us how you're combining the PDFs which makes your question rather theoretical, so I am going to give you a theoretical answer:

Part 1

  • Suppose you have a PDF file with 10 pages and a total size of 1200 KByte.
  • Suppose that the content stream of each page roughly consists of 100 KByte. From this content stream, there are references to shared resources.
  • Suppose that these 10 pages share 200 KByte in resources: they share the same fonts, the same images, and so on.

If you "burst" this PDF into 10 separate single-page PDFs, each PDF will consist of about 300 KByte: 100 KByte in content stream + 200 KByte in resources (I'm ignoring the overhead of having 10 separate xref tables and file trailers).

  • If you combine these 10 separate single-page PDFs as if these 10 PDFs have nothing in common, the total file size will be 10 x 300 KByte. That's 3000 KByte, which is more than double of the original 1200 KByte.
  • If you combine these 10 separate single-page PDFs taking into account that they have resources in common (fonts, resources,...), the total size will be (10 x 100 KByte) + 200 KByte.

If you're using iText to combine the PDFs, then using PdfCopy will result in the 3000 KByte PDF, because PdfCopy just copies documents as fast as possible without looking at the content of the document. If you want the 1200 KByte PDF, then you need to use PdfSmartCopy in which case you'll need more memory and CPU because iText will examine each PDF and reuse objects that would otherwise be redundant.

Part 2

In your question, you mention that you have a 1.4Mb and a 740kb PDF, and that 1.4Mb + 740kb results in a PDF of 6Mb. The first part of my theoretical example doesn't explain the extreme growth in size, so here's a second part.

  • In PDF 1.0, PDF syntax wasn't compressed.
  • Starting with PDF 1.2, streams were compressed, but indirect objects and the cross-reference stream were stored in ASCII.
  • Starting with PDF 1.5, a series of objects could be compressed in an object stream and the cross-reference table could be compressed too.

Suppose that your original PDFs have compressed object streams and a compressed cross-reference table. Suppose that you combine these PDFs into a PDF that is more like a PDF 1.4 document. In that case, the compressed objects and the compressed cross-reference stream will no longer be compressed, resulting in a much bigger file size.

Part 3?

There might be other reasons, depending on the nature of the original PDFs and on the tool that you're using to combine the PDFs. You should clarify if none of the above applies.

这篇关于为什么合并PDF会使文件大小膨胀?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆