使用itext java库复制时,pdf文件大小会大大增加 [英] pdf file size is largely increased when copied using itext java library

查看:1352
本文介绍了使用itext java库复制时,pdf文件大小会大大增加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Java中的itextpdf库将现有的pdf文件复制到一些新文件中。我使用的是版本5.5.10的itextpdf。我在两个方面面临着不同的问题:PDFStamper和PdfCopy。当我使用PDFStamper类时,我发现虽然没有添加任何新项目,但新文件大小增加了很多。这是代码片段:

I am trying to copy existing pdf file into some new file using itextpdf library in Java. I am using version 5.5.10 of itextpdf. I am facing different issues with both ways : PDFStamper and PdfCopy. When I use PDFStamper class, I observe that new file size is increased by large margin, although nothing new items were added. Here is code piece :

    String currFile="C:\misc\pdffiles\AcroJS.pdf" ;
    String dest = "C:\misc\pdffiles\AcroJS_copy.pdf" ;
    PdfReader reader = new PdfReader(currFile) ;
    PdfStamper stamper = new PdfStamper(reader,new FileOutputStream(dest)) ;
    stamper.close() ;
    reader.close() ;

一些观察结果是:7 MB(原始)到13 MB(大约,新文件),116 KB > 119 KB(大约)

Some observations are : 7 MB(original) to 13 MB (Approx, new file) , 116 KB > 119 KB (Approx)

我只是希望在复制现有的pdf文件时大致相同的文件大小。我无法弄清楚为什么大小会增加那么多。

I was expecting approximate same file size when just copying existing pdf file. I am not able to figure out why size is increasing that much.

我也尝试过PdfCopy类。我用PdfCopy跟踪了2种方法:

I have tried PdfCopy class as well. I Followed 2 approaches with PdfCopy:


  1. 逐页复制。

  2. 调用setMergeFields()on pdfcopy对象然后调用pdfcopy.addDocument(reader);

但这两种方法的问题在于它丢弃了一些非内容元数据来自pdf文件,因此当Adobe读者打开时,新的pdf正在破碎。例如,我的pdf包含字典对象PdfName.S。在这种情况下,新创建的pdf文件只有2KB(原始版本为1.6 MB),它显然意味着没有任何内容被复制到文档中并且它已被破坏。

But problem in both approaches is that it is throwing away some non-content metadata from pdf file and hence new pdf is breaking when opened by Adobe reader. For example my pdf contains dictionary object PdfName.S . In this case newly created pdf file is just 2KB (original was 1.6 MB) , it clearly means nothing is copied into document and it is broken.

我原来的要求非常高简单:将现有的pdf复制到新的pdf文件,不增加大小,不丢弃必要的项目。显而易见它不像,复制,粘贴然后重命名。因为在下一步中,我有一些与pdf内容有关的处理。任何帮助都感激不尽。

My original requirement is very simple : copy existing pdf to new pdf file, without increase in size, without throwing away necessary items. Obiviously It is not like, copy, paste and then rename. Because in next step, I have some processings to do with pdf content. Any help will be much appreciated.

操作系统:Windows 10 Pro
Java:1.8.101
itext:5.5.10

OS : Windows 10 Pro Java : 1.8.101 itext : 5.5.10

谢谢

推荐答案

使用 PdfStamper



您的代码



您的代码

Use of PdfStamper

Your code

Your code

PdfStamper stamper = new PdfStamper(reader,new FileOutputStream(dest)) ;
stamper.close() ;

基本上告诉iText复制原始PDF 丢弃未使用的对象并使用iText的默认压缩设置

essentially tells iText to copy the original PDF throwing away unused object and using iText's default compression settings.

iText的默认压缩设置包括 not 使用压缩交叉引用和对象流(在PDF 1.5中引入)但旧技术交叉引用表和单独压缩的对象。

iText's default compression settings include not using compressed cross reference and object streams (introduced in PDF 1.5) but the older technique of cross reference tables and individually compressed objects.

另一方面,示例文件使用这些技术。因此,压缩得更好。

The sample file, on the other hand does use these techniques. Thus, it is much better compressed.

你可以告诉iText使用这些改进的压缩技术也是这样的:

You can tell iText to use these improved compression techniques, too, like this:

PdfReader reader = new PdfReader(resourceStream);
PdfStamper stamper = new PdfStamper(reader, outputStream);
stamper.setFullCompression();

stamper.close();

Stamping.java 测试方法 testStampAcroJSCompressed

(Stamping.java test method testStampAcroJSCompressed)

这会导致文件大小小于4 MB。

This results in a file less than 4 MB in size.

如果你想忠实于存储对象的原始方式,你可以改为使用与原始文件完全相同的附加模式文件并以所谓的增量更新的形式添加更改,如下所示:

If you want to remain faithful to the original way objects were stored, you can instead use the append mode which identically copies the original file and adds changes in the form of a so called incremental update, like this:

PdfReader reader = new PdfReader(resourceStream);
PdfStamper stamper = new PdfStamper(reader, outputStream, '\0', true);

stamper.close();

Stamping.java 测试方法 testStampAcroJSAppended

(Stamping.java test method testStampAcroJSAppended)

这会导致文件略大于原始文件。

This results in a file slightly larger than the original file.

您发现 PdfCopy


丢弃一些非内容元数据

is throwing away some non-content metadata

当然可以。 PdfCopy 旨在将页面从一个PDF复制到另一个PDF,保持内容和注释不变,但忽略其他页面级别和所有文档级别信息。

Of course it does. PdfCopy is designed to copy pages from one PDF to another, keeping content and annotations as they were but ignoring other page-level and all document-level information.

这篇关于使用itext java库复制时,pdf文件大小会大大增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆