PDF压缩库/工具 [英] PDF compressing library/tool

查看:125
本文介绍了PDF压缩库/工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个项目,以减小PDF的大小,并将其压缩.我想知道市场上是否有任何非常好的工具/库(.NET). 我没有尝试过类似Onstream Compression的工具,但结果并不令人满意.

I am working on a project to reduce the size of the PDF's, compress them. I am wondering are there any good tools/library (.NET) in market that are really good. I did try few tools like Onstream Compression, but the results were not satisfactory.

推荐答案

可以轻松地从PDF中挤出一些额外的(兆)字节.例如,众所周知的"PDF32000_2008.pdf" 是否已充分优化?文件大小为8,995,189字节.它使用对象和外部参照流,(几乎)没有图像,所有东西都被压缩.还是不是?

Some additional (mega-)bytes can easily be squeezed out of PDFs. E.g., is a well known "PDF32000_2008.pdf" optimized enough? File size is 8,995,189 bytes. It uses object and xref streams, (nearly) no images, everything is packed tight. Or is it not?

查看页面字典:

Dict:9 [1 0 R]
.   /Annots Array:3
.   /Contents Stream:3 [2 0 R]
.   /CropBox Array:4
.   /MediaBox Array:4
.   /Parent Dict:4 [124248 0 R]
.   /Resources Dict:4
.   /Rotate 0 (Number)
.   /StructParents 2 (Number)
.   /Type Page (Name)

Rotate 0是默认设置,为什么会出现在其中? CropBox有什么用?它默认为MediaBox,并且除MediaBox外,该文档中没有任何页面带有CropBox.为什么在这里MediaBox?它是可继承的,所有页面的大小相同,因此将其移至Pages树根!有756页,即冗余(或无用)信息被重复756次.

Rotate 0 is a default, why is it there? What is CropBox there for? It defaults to MediaBox, and there's no page in this document with CropBox other than MediaBox. Why is MediaBox there? It's inheritable, all pages are the same size, so move it to Pages tree root! There are 756 pages, i.e. redundant (or useless) information replicated 756 times.

查看典型的注释词典:

Dict:6 [3548 0 R]
.   /A Dict:2
.   .   /S URI (Name)
.   .   /URI http://www.iso.org/iso/iso_catalogue/... (String)
.   /Border Array:3
.   .   [0] 0 (Number)
.   .   [1] 0 (Number)
.   .   [2] 0 (Number)
.   /Rect Array:4
.   .   [0] 82.14 (Number)
.   .   [1] 576.8 (Number)
.   .   [2] 137.1 (Number)
.   .   [3] 587.18 (Number)
.   /StructParent 3 (Number)
.   /Subtype Link (Name)
.   /Type Annot (Name)

此文档中有数千个(也许> 10'000?)链接注释. /Type键是可选的,为什么会出现呢?它们是不可见的矩形,您认为它们的放置精度(除整数之外)是否重要?将其舍入为整数.

There are thousands (maybe > 10'000?) link annotations in this document. /Type key is optional, why is it there? They are invisible rectangles, do you think their placement precision other than whole number of points is relevant? Round it to integer.

看看典型的页面内容流的片段,文字显示运算符:

Look at the fragment of typical page content stream, text showing operator:

[(w)7(ed)-6( b)21(u)1(t shal)-6(l no)-6(t b)-6(e)1( ed)-6(ite)-6(d)1( un)-6(less the typef)23(aces wh)-6(ich )]TJ

小于某个值的紧缩几乎是看不见的.这个可能会引起争议,就像JPEG压缩质量级别一样-某些人可以接受,另一些人不同意.我认为,非常保守的估算(即保持最高质量)对普通人而言是不可见的,因此可以忽略绝对值小于10的字距调整. (当然,必须注意保持正当理由). (而且我什至没有提到存在带有 fractional 字距紧缩的文件,精度为3-6个小数位!但是在此文件中没有)

Kerning of less than some value is all but invisible. This value may be debated, it's like JPEG compression quality level - acceptable to some, others disagree. I think that very conservative estimate (i.e. retaining most quality), with effect invisible to general person, is that kerning of absolute value less than 10 may be omitted. (Care must be taken to preserve justification, of course). (And I don't even mention that there are files out there with fractional kerning with precision of 3-6 decimal places! But not in this file)

并且,通过上面提到的优化,文件大小变为了7,982,478个字节.减少了1兆字节.当然,优化的来源并不是更好的隐蔽之处,也许还有其他限制.

And, with optimizations mentioned above, file size became 7,982,478 bytes. One megabyte shaved off. And it's certainly not the limit, there maybe others, that are hidden better, sources of optimization.

这篇关于PDF压缩库/工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆