如何使用ghostscript删除PDF中的重复对象? [英] How to remove duplicate objects in PDF using ghostscript?

查看:139
本文介绍了如何使用ghostscript删除PDF中的重复对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用命令行ghostscript,是否可以删除PDF中重复的嵌入对象(图像)并将其替换为单个实例?

Using command-line ghostscript, is it possible to remove duplicate embedded objects (images) in the PDF and replace them with a single instance?

我有200多页的PDF,其中包含背景图片和每页上的一些较小的徽标.该文件非常大,因为在每个页面中都嵌入了非常相同的背景图像和徽标二进制文件,而不是先嵌入然后在每个页面上引用.我不是PDF的创建者,所以我无法从根本上解决问题.

I have a 200+ pages PDF with a background image and some smaller logos on each page. The file is very large, because the very same background image and logo binaries are embedded in each individual page, instead of being embedded once and then referenced on each page. I am not the creator of the PDF so I can not solve the problem at it's source.

(我不想缩小或降低图像质量,也不想完全删除它们.)

(I do not want to shrink or reduce the image quality, and I do not want delete them completely.)

推荐答案

否,ghostscript(更具体地说是pdfwrite设备)不会替换图像XObjects或嵌入式图像,它不会测试它们是否相同.

No, ghostscript (more specifically the pdfwrite device) won't replace image XObjects or inline images, it doesn't test them to see if tehy are identical.

可以这样做,但这意味着检查每个图像的每个字节,这在性能上可能是非常昂贵的,因此我们暂时不这样做.如果您想修改源代码,我可以提供一些建议,从何处开始.

It would be possible to do so, but it means checking every byte of each image, which can be very expensive on performance, so we don't do it at the moment. If you want to have a go at modifying the source I can give some suggestions on where to start.

FWIW测试了许多其他对象的重复项,但没有测试图像,这仅仅是因为读取和散列大图像需要花费时间.

FWIW many other objects are tested for duplicates, but not images, simply because of the time taken to read and hash large images.

这篇关于如何使用ghostscript删除PDF中的重复对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆