PDFbox表示PDDocument在未关闭时关闭 [英] PDFbox saying PDDocument closed when its not

查看:128
本文介绍了PDFbox表示PDDocument在未关闭时关闭的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用PDFbox填充重复的表单.我正在使用TreeMap,并使用单个记录填充表单.pdf格式的格式是在第一页上列出了六个记录,在第二页上插入了一个静态页.(对于大于六个记录的TreeMap,将重复此过程).我得到的错误特定于TreeMap的大小.这就是我的问题.我不知道为什么当我用超过35个条目填充TreeMap时收到此警告:

I am trying to populate repeated forms with PDFbox. I am using a TreeMap and populating the forms with individual records. The format of the pdf form is such that there are six records listed on page one and a static page inserted on page two. (For a TreeMap larger than six records, the process repeats). The error Im getting is specific to the size of the TreeMap. Therein lies my problem. I can't figure out why when I populate the TreeMap with more than 35 entries I get this warning:

2018年4月23日上午2:36:25 org.apache.pdfbox.cos.COSDocument最终定稿警告:警告:您没有关闭PDF文档

Apr 23, 2018 2:36:25 AM org.apache.pdfbox.cos.COSDocument finalize WARNING: Warning: You did not close a PDF Document

public class test {
    public static void main(String[] args) throws IOException,         IOException {
    // TODO Auto-generated method stub
    File dataFile = new File("dataFile.csv");
    File fi = new File("form.pdf");
    Scanner fileScanner = new Scanner(dataFile);
    fileScanner.nextLine();
    TreeMap<String, String[]> assetTable = new TreeMap<String, String[]>();
    int x = 0;
    while (x <= 36) {
        String lineIn = fileScanner.nextLine();
        String[] elements = lineIn.split(",");
        elements[0] = elements[0].toUpperCase().replaceAll(" ", "");
        String key = elements[0];
        key = key.replaceAll(" ", "");
        assetTable.put(key, elements);
        x++;
    }
    PDDocument newDoc = new PDDocument();
    int control = 1;
    PDDocument doc = PDDocument.load(fi);
    PDDocumentCatalog cat = doc.getDocumentCatalog();
    PDAcroForm form = cat.getAcroForm();
    for (String s : assetTable.keySet()) {
        if (control <= 6) {
            PDField IDno1 = (form.getField("IDno" + control));
            PDField Locno1 = (form.getField("locNo" + control));
            PDField serno1 = (form.getField("serNo" + control));
            PDField typeno1 = (form.getField("typeNo" + control));
            PDField maintno1 = (form.getField("maintNo" + control));
            String IDnoOne = assetTable.get(s)[1];
            //System.out.println(IDnoOne);
            IDno1.setValue(assetTable.get(s)[0]);
            IDno1.setReadOnly(true);
            Locno1.setValue(assetTable.get(s)[1]);
            Locno1.setReadOnly(true);
            serno1.setValue(assetTable.get(s)[2]);
            serno1.setReadOnly(true);
            typeno1.setValue(assetTable.get(s)[3]);
            typeno1.setReadOnly(true);
            String type = "";
            if (assetTable.get(s)[5].equals("1"))
                type += "Hydrotest";
            if (assetTable.get(s)[5].equals("6"))
                type += "6 Year Maintenance";
            String maint = assetTable.get(s)[4] + " - " + type;
            maintno1.setValue(maint);
            maintno1.setReadOnly(true);
            control++;
        } else {
            PDField dateIn = form.getField("dateIn");
            dateIn.setValue("1/2019 Yearlies");
            dateIn.setReadOnly(true);
            PDField tagDate = form.getField("tagDate");
            tagDate.setValue("2019 / 2020");
            tagDate.setReadOnly(true);
            newDoc.addPage(doc.getPage(0));
            newDoc.addPage(doc.getPage(1));
            control = 1;
            doc = PDDocument.load(fi);
            cat = doc.getDocumentCatalog();
            form = cat.getAcroForm();
        }
    }
    PDField dateIn = form.getField("dateIn");
    dateIn.setValue("1/2019 Yearlies");
    dateIn.setReadOnly(true);
    PDField tagDate = form.getField("tagDate");
    tagDate.setValue("2019 / 2020");
    tagDate.setReadOnly(true);
    newDoc.addPage(doc.getPage(0));
    newDoc.addPage(doc.getPage(1));
    newDoc.save("PDFtest.pdf");
    Desktop.getDesktop().open(new File("PDFtest.pdf"));

}

我无法为自己的一生弄清楚我做错了什么.这是我使用PDFbox的第一周,所以我希望它简单一些.

I cant figure out for the life of me what I'm doing wrong. This is the first week I've been working with PDFbox so I'm hoping its something simple.

更新的错误消息

WARNING: Warning: You did not close a PDF Document
Exception in thread "main" java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
    at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:77)
    at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:125)
    at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1200)
    at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:383)
    at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
    at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:522)
    at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:460)
    at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:444)
    at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1096)
    at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:419)
    at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1367)
    at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1254)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1232)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1204)
    at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1192)
    at test.test.main(test.java:87)

推荐答案

警告本身

您似乎得到了错误的警告.它说:

The warning by itself

You appear to get the warning wrong. It says:

警告:您没有关闭PDF文档

Warning: You did not close a PDF Document

因此,与您的想法相反,"PDFbox说PDDocument在未关闭时关闭", PDFBox说您没有关闭文档!

So in contrast to what you think, "PDFbox saying PDDocument closed when its not", PDFBox says that you did not close a document!

编辑后,您会看到它实际上是说 COSStream 已关闭,并且 可能原因>是封闭的 PDDocument 已经关闭.这只是可能性!

After your edit one sees that it actually says that a COSStream has been closed and that a possible cause is that the enclosing PDDocument already has been closed. This is a mere possibility!

也就是说,通过将一个文档中的页面添加到另一个文档中,您可能最终会引用两个文档中的这些页面.在这种情况下,在关闭两个文档的过程中(例如,通过垃圾回收自动关闭),第二次关闭确实可能偶然遇到了一些已经关闭的 COSStream 实例.

That been said, by adding pages from one document to another you probably end up having references to those pages from both documents. In that case in the course of closing both documents (e.g. automatically via garbage collection), the second one closing may indeed stumble across some already closed COSStream instances.

因此,我的第一个建议是简单地最后关闭文档

So my first advice to simply do close the documents at the end by

doc.close();
newDoc.close();

可能不会删除警告,而只是更改警告的时间.

probably won't remove the warnings, merely change their timing.

实际上,您不仅可以创建两个文档 doc newDoc ,还可以创建新的 PDDocument 实例并将它们分配给doc ,在此过程中,该变量中的先前文档对象免费设置为垃圾回收.因此,一旦不再引用,您最终将关闭一大堆文档.

Actually you don't merely create two documents doc and newDoc, you even create new PDDocument instances and assign them to doc again and again, in the process setting the former document objects in that variable free for garbage collection. So you eventually have a big bunch of documents to be closed as soon as not referenced anymore.

我认为尽早关闭 doc 中的所有这些文档不是一个好主意,尤其是在保存 newDoc 之前不要关闭.

I don't think it would be a good idea to close all those documents in doc early, in particular not before saving newDoc.

但是,如果您的代码最终将作为大型应用程序的一部分而不是小型的单次测试应用程序运行,则应将所有这些 PDDocument 实例收集在某些 Collection ,并在保存 newDoc 之后立即将其明确关闭,然后清除集合.

But if your code will eventually be run as part of a larger application instead of as a small, one-shot test application, you should collect all those PDDocument instances in some Collection and explicitly close them right after saving newDoc and then clear the collection.

实际上,您的异常看起来像是那些丢失的 PDDocument 实例之一已被垃圾回收所关闭,因此,即使使用简单的一次性实用程序,也应收集文档,以防止它们被破坏GC已处置.

Actually your exception looks like one of those lost PDDocument instances has already been closed by garbage collection, so you should collect the documents even in case of a simple one-shot utility to keep them from being GC disposed.

(@ Tilman,如果我错了,请纠正我...)

(@Tilman, please correct me if I'm wrong...)

为防止不同文档共享页面出现问题,您可以尝试将页面 导入 到目标文档,然后将导入的页面添加到目标文档页面树中.IE.替换

To prevent problems with different documents sharing pages, you can try and import the pages to the target document and thereafter add the imported page to the target document page tree. I.e. replace

newDoc.addPage(doc.getPage(0));
newDoc.addPage(doc.getPage(1));

作者

newDoc.addPage(newDoc.importPage(doc.getPage(0)));
newDoc.addPage(newDoc.importPage(doc.getPage(1)));

这应该允许您先关闭 doc 中的每个 PDDocument 实例,然后再将其丢失.但是,这有一些缺点.方法JavaDoc和此处的答案.

This should allow you to close each PDDocument instance in doc before losing it. There are certain drawbacks to this, though, cf. the method JavaDoc and this answer here.

在合并的文档中,您将具有许多最初具有不同名称的字段(至少在CSV文件中条目数量足够多的情况下).而且,您可以从相应原始文档的 PDAcroForm 中访问字段,但不要将其添加到合并结果文档的 PDAcroForm 中.

In your combined document you will have many fields with the same name (at least in case of a sufficiently high number of entries in your CSV file) which you initially set to different values. And you access the fields from the PDAcroForm of the respective original document but don't add them to the PDAcroForm of the combined result document.

这是自找麻烦!PDF格式确实认为表单是整个文档范围内的,所有字段都直接或间接地从文档的AcroForm字典中引用,并且它期望具有相同名称的字段实际上是同一字段的不同可视化,因此所有人都具有相同的值.

This is asking for trouble! The PDF format does consider forms to be document-wide with all fields referenced (directly or indirectly) from the AcroForm dictionary of the document, and it expects fields with the same name to effectively be different visualizations of the same field and therefore to all have the same value.

因此,PDF处理器可能会以意想不到的方式处理您的文档字段,例如

Thus, PDF processors might handle your document fields in unexpected ways, e.g.

  • 通过在所有具有相同名称的字段中显示相同的值(因为它们应该具有相同的值)或
  • 忽略字段(因为它们不在文档 AcroForm 结构中).

特别是,无法以编程方式读取PDF字段值,因为在这种情况下,该表单被确定地认为是文档范围的,并且基于 AcroForm .另一方面,PDF查看器可能会首先显示您的设置值并使外观看起来还不错.

In particular programmatic reading of your PDF field values will fail because in that context the form is definitively considered document-wide and based in AcroForm. PDF viewers on the other hand might first show your set values and make look things ok.

为避免这种情况,您应该在合并之前重命名字段.您可以考虑使用 PDFMergerUtility ,它可以在后台进行这样的重命名.有关该实用工具类的用法示例,请查看 PDFMergerExample .

To prevent this you should rename the fields before merging. You might consider using the PDFMergerUtility which does such a renaming under the hood. For an example usage of that utility class have a look at the PDFMergerExample.

这篇关于PDFbox表示PDDocument在未关闭时关闭的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆