pdfbox-添加pdf嵌入式文件并将PDDocument保存到OutputStream不会保留嵌入式文件 [英] Pdfbox - adding pdf embedded File and save the PDDocument to OutputStream does not keep the embedded Files

查看:330
本文介绍了pdfbox-添加pdf嵌入式文件并将PDDocument保存到OutputStream不会保留嵌入式文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Pdfbox(1.8.8)向pdf添加附件.我的问题是当附件之一为.pdf类型并且我将PDDocument保存到OutputStream时,最终的pdf文档不包括附件.如果将PDDocument保存到文件中,则OutputStream都可以正常工作;如果附件不包含任何pdf,则保存到文件或OutputStream都可以正常工作.

我想知道是否有任何方法可以添加pdf嵌入文件并将PDDocument保存到OutputStream,从而将附件保留在最终生成的pdf中.

我正在使用的代码是:

  private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {

            final PDDocument doc;
            Boolean hasPdfAttach = false;
            try {
                doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
                // final PDFTextStripper pdfStripper = new PDFTextStripper();
                // final String text = pdfStripper.getText(doc);
                final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
                final Map embeddedFileMap = new HashMap();
                PDEmbeddedFile embeddedFile;
                File file = null;

                for (Attachment attach : attachmentsResources) {

                    // first create the file specification, which holds the embedded file
                    final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
                    fileSpecification.setFile(attach.getFilename());
                    file = AttachmentUtils.getAttachmentFile(attach);
                    final InputStream is = new FileInputStream(file.getAbsolutePath());

                    embeddedFile = new PDEmbeddedFile(doc, is);
                    // set some of the attributes of the embedded file
                    if ("application/pdf".equals(attach.getMimetype())) {
                        hasPdfAttach = true;
                    }
                    embeddedFile.setSubtype(attach.getMimetype());
                    embeddedFile.setSize((int) (long) attach.getFilesize());
                    fileSpecification.setEmbeddedFile(embeddedFile);

                    // now add the entry to the embedded file tree and set in the document.
                    embeddedFileMap.put(attach.getFilename(), fileSpecification);
                    // final String text2 = pdfStripper.getText(doc);
                }
                // final String text3 = pdfStripper.getText(doc);
                efTree.setNames(embeddedFileMap);
                // ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS); (this not work for me)
                // attachments are stored as part of the "names" dictionary in the document catalog
                final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
                names.setEmbeddedFiles(efTree);
                doc.getDocumentCatalog().setNames(names);
                // final ByteArrayOutputStream pdfboxToDocumentStream = new ByteArrayOutputStream();
                final String tmpfile = "temporary.pdf";
                if (hasPdfAttach) {
                    final File f = new File(tmpfile);
                    doc.save(f);
                    doc.close();
                     //i have try with parser but without success too
                    // PDFParser parser = new PDFParser(new FileInputStream(tmpfile));
                    // parser.parse();
                    // PDDocument doc2 = parser.getPDDocument();
                    final PDDocument doc2 = PDDocument.loadNonSeq(f, new RandomAccessFile(new File(getHomeTMP()
                            + "tempppp.pdf"), "r"));
                    doc2.save(out);
                    doc2.close();
                } else {
                    doc.save(out);
                    doc.close();
                }
                 //that does not work too
                // final InputStream in = new FileInputStream(tmpfile);
                // IOUtils.copy(in, out);
                // out = new FileOutputStream(tmpFile);
                // doc.save (out);

            } catch (IOException e1) {
                e1.printStackTrace();
            } catch (Exception e2) {
                e2.printStackTrace();
            }
        }
 

最诚挚的问候

解决方案:

 private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {

    final PDDocument doc;
    try {
        doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
        ((ByteArrayOutputStream) out).reset();
        final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
        final Map embeddedFileMap = new HashMap();
        PDEmbeddedFile embeddedFile;
        File file = null;

        for (Attachment attach : attachmentsResources) {

            // first create the file specification, which holds the embedded file
            final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
            fileSpecification.setFile(attach.getFilename());
            file = AttachmentUtils.getAttachmentFile(attach);
            final InputStream is = new FileInputStream(file.getAbsolutePath());

            embeddedFile = new PDEmbeddedFile(doc, is);
            // set some of the attributes of the embedded file
            embeddedFile.setSubtype(attach.getMimetype());
            embeddedFile.setSize((int) (long) attach.getFilesize());
            fileSpecification.setEmbeddedFile(embeddedFile);

            // now add the entry to the embedded file tree and set in the document.
            embeddedFileMap.put(attach.getFilename(), fileSpecification);

        }
        efTree.setNames(embeddedFileMap);
        ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
        // attachments are stored as part of the "names" dictionary in the document catalog
        final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
        names.setEmbeddedFiles(efTree);
        doc.getDocumentCatalog().setNames(names);
        ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
        doc.save(out);
        doc.close();

    } catch (IOException e1) {
        e1.printStackTrace();
    } catch (Exception e2) {
        e2.printStackTrace();
    }
}
 

解决方案

您将新的PDF存储在out中的原始PDF之后:

查看方法中out的所有用法:

private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
    ...
            doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
    ...
                doc2.save(out);
    ...
                doc.save(out);

因此,您将输入ByteArrayOutputStream作为输入并将其当前内容作为输入(即ByteArrayOutputStream不为空,但已经包含PDF),并在进行一些处理后将修改后的PDF附加到ByteArrayOutputStream.视您向其展示的PDF查看器而定,将显示原始PDF或经过处理的PDF,或者显示(非常正确的)错误消息,表明该文件是垃圾.

如果您希望ByteArrayOutputStream仅包含经过处理的PDF,只需添加

((ByteArrayOutputStream) out).reset();

或(如果不确定流的状态)

out = new ByteArrayOutputStream();

紧接着

doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));

PS::根据评论,OP尝试对他的代码进行上述提议的更改,但未成功.

我无法运行问题中提出的代码,因为它不是独立的.因此,我将其简化为获得独立测试的必要条件:

@Test
public void test() throws IOException, COSVisitorException
{
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (
            InputStream sourceStream = getClass().getResourceAsStream("test.pdf");
            InputStream attachStream = getClass().getResourceAsStream("artificial text.pdf"))
    {
        final PDDocument document = PDDocument.load(sourceStream);

        final PDEmbeddedFile embeddedFile = new PDEmbeddedFile(document, attachStream);
        embeddedFile.setSubtype("application/pdf");
        embeddedFile.setSize(10993);

        final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
        fileSpecification.setFile("artificial text.pdf");
        fileSpecification.setEmbeddedFile(embeddedFile);

        final Map<String, PDComplexFileSpecification> embeddedFileMap = new HashMap<String, PDComplexFileSpecification>();
        embeddedFileMap.put("artificial text.pdf", fileSpecification);

        final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
        efTree.setNames(embeddedFileMap);

        final PDDocumentNameDictionary names = new PDDocumentNameDictionary(document.getDocumentCatalog());
        names.setEmbeddedFiles(efTree);
        document.getDocumentCatalog().setNames(names);

        document.save(baos);
        document.close();
    }
    Files.write(Paths.get("attachment.pdf"), baos.toByteArray());
}

如您所见,PDFBox在这里仅使用流.结果:

"attachment.pdf" >

因此,没有问题的PDFBox可以存储嵌入了PDF文件附件的PDF.

因此,问题很可能与这样的工作流程无关

I'm using Pdfbox (1.8.8) to adding attachments to a pdf. My problem is when one of the attachments is of type .pdf and i'm saving the PDDocument to OutputStream the final pdf document does not include the attachments. If a save the PDDocument to a file instead an OutputStream all works just fine, and if the attachments does not include any pdf, both save to file or OutputStream works fine.

I would like to know if there is any way to add pdf embedded Files and save the PDDocument to OutputStream keeping the attached files in the final pdf that is generated.

The code i'm using is:

 private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {

            final PDDocument doc;
            Boolean hasPdfAttach = false;
            try {
                doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
                // final PDFTextStripper pdfStripper = new PDFTextStripper();
                // final String text = pdfStripper.getText(doc);
                final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
                final Map embeddedFileMap = new HashMap();
                PDEmbeddedFile embeddedFile;
                File file = null;

                for (Attachment attach : attachmentsResources) {

                    // first create the file specification, which holds the embedded file
                    final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
                    fileSpecification.setFile(attach.getFilename());
                    file = AttachmentUtils.getAttachmentFile(attach);
                    final InputStream is = new FileInputStream(file.getAbsolutePath());

                    embeddedFile = new PDEmbeddedFile(doc, is);
                    // set some of the attributes of the embedded file
                    if ("application/pdf".equals(attach.getMimetype())) {
                        hasPdfAttach = true;
                    }
                    embeddedFile.setSubtype(attach.getMimetype());
                    embeddedFile.setSize((int) (long) attach.getFilesize());
                    fileSpecification.setEmbeddedFile(embeddedFile);

                    // now add the entry to the embedded file tree and set in the document.
                    embeddedFileMap.put(attach.getFilename(), fileSpecification);
                    // final String text2 = pdfStripper.getText(doc);
                }
                // final String text3 = pdfStripper.getText(doc);
                efTree.setNames(embeddedFileMap);
                // ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS); (this not work for me)
                // attachments are stored as part of the "names" dictionary in the document catalog
                final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
                names.setEmbeddedFiles(efTree);
                doc.getDocumentCatalog().setNames(names);
                // final ByteArrayOutputStream pdfboxToDocumentStream = new ByteArrayOutputStream();
                final String tmpfile = "temporary.pdf";
                if (hasPdfAttach) {
                    final File f = new File(tmpfile);
                    doc.save(f);
                    doc.close();
                     //i have try with parser but without success too
                    // PDFParser parser = new PDFParser(new FileInputStream(tmpfile));
                    // parser.parse();
                    // PDDocument doc2 = parser.getPDDocument();
                    final PDDocument doc2 = PDDocument.loadNonSeq(f, new RandomAccessFile(new File(getHomeTMP()
                            + "tempppp.pdf"), "r"));
                    doc2.save(out);
                    doc2.close();
                } else {
                    doc.save(out);
                    doc.close();
                }
                 //that does not work too
                // final InputStream in = new FileInputStream(tmpfile);
                // IOUtils.copy(in, out);
                // out = new FileOutputStream(tmpFile);
                // doc.save (out);

            } catch (IOException e1) {
                e1.printStackTrace();
            } catch (Exception e2) {
                e2.printStackTrace();
            }
        }

Best regards

Solution:

private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {

    final PDDocument doc;
    try {
        doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
        ((ByteArrayOutputStream) out).reset();
        final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
        final Map embeddedFileMap = new HashMap();
        PDEmbeddedFile embeddedFile;
        File file = null;

        for (Attachment attach : attachmentsResources) {

            // first create the file specification, which holds the embedded file
            final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
            fileSpecification.setFile(attach.getFilename());
            file = AttachmentUtils.getAttachmentFile(attach);
            final InputStream is = new FileInputStream(file.getAbsolutePath());

            embeddedFile = new PDEmbeddedFile(doc, is);
            // set some of the attributes of the embedded file
            embeddedFile.setSubtype(attach.getMimetype());
            embeddedFile.setSize((int) (long) attach.getFilesize());
            fileSpecification.setEmbeddedFile(embeddedFile);

            // now add the entry to the embedded file tree and set in the document.
            embeddedFileMap.put(attach.getFilename(), fileSpecification);

        }
        efTree.setNames(embeddedFileMap);
        ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
        // attachments are stored as part of the "names" dictionary in the document catalog
        final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
        names.setEmbeddedFiles(efTree);
        doc.getDocumentCatalog().setNames(names);
        ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
        doc.save(out);
        doc.close();

    } catch (IOException e1) {
        e1.printStackTrace();
    } catch (Exception e2) {
        e2.printStackTrace();
    }
}

解决方案

You store the new PDF after the original PDF in out:

Look at all the uses of out in your method:

private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
    ...
            doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
    ...
                doc2.save(out);
    ...
                doc.save(out);

So you get as input a ByteArrayOutputStream and take its current content as input (i.e. the ByteArrayOutputStream is not empty but already contains a PDF) and after some processing you append the modified PDF to the ByteArrayOutputStream. Depending on the PDF viewer you present this to, you will be shown either the original or the manipulated PDF or a (very correct) error message that the file is garbage.

If you want the ByteArrayOutputStream to contain only the manipulated PDF, simply add

((ByteArrayOutputStream) out).reset();

or (if you are not sure about the state of the stream)

out = new ByteArrayOutputStream();

right after

doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));

PS: According to the comments the OP tried the above proposed changes to his code without success.

I cannot run the code as presented in the question because it is not self-contained. Thus, I reduced it to the essentials to get a self-contained test:

@Test
public void test() throws IOException, COSVisitorException
{
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (
            InputStream sourceStream = getClass().getResourceAsStream("test.pdf");
            InputStream attachStream = getClass().getResourceAsStream("artificial text.pdf"))
    {
        final PDDocument document = PDDocument.load(sourceStream);

        final PDEmbeddedFile embeddedFile = new PDEmbeddedFile(document, attachStream);
        embeddedFile.setSubtype("application/pdf");
        embeddedFile.setSize(10993);

        final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
        fileSpecification.setFile("artificial text.pdf");
        fileSpecification.setEmbeddedFile(embeddedFile);

        final Map<String, PDComplexFileSpecification> embeddedFileMap = new HashMap<String, PDComplexFileSpecification>();
        embeddedFileMap.put("artificial text.pdf", fileSpecification);

        final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
        efTree.setNames(embeddedFileMap);

        final PDDocumentNameDictionary names = new PDDocumentNameDictionary(document.getDocumentCatalog());
        names.setEmbeddedFiles(efTree);
        document.getDocumentCatalog().setNames(names);

        document.save(baos);
        document.close();
    }
    Files.write(Paths.get("attachment.pdf"), baos.toByteArray());
}

As you see PDFBox here uses only streams. The result:

Thus, PDFBox without problem stores a PDF into which it has embedded a PDF file attachment.

The problem, therefore, most likely have nothing to do with this work flow as such

这篇关于pdfbox-添加pdf嵌入式文件并将PDDocument保存到OutputStream不会保留嵌入式文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆