如何使用 Apache PDFBox 从 PDF 中的按钮图标中提取图像? [英] How can i extract image from button icon in PDF using Apache PDFBox?

查看:47
本文介绍了如何使用 Apache PDFBox 从 PDF 中的按钮图标中提取图像?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 java netbeans 从 pdf 中的按钮获取图像图标,并将其放入某个面板中.然而,我在这里撞到了一块砖.我使用 PDFBox 作为我的 PDF 导出器,但我似乎不太理解.我已经成功地从表单字段中读取,但是只要我尝试在 PDFBox 中找到它,就没有按钮提取器.我应该怎么做?是否可以使用这种方法,或者是否有其他方法.提前致谢.

我已经发现使用此代码使用示例实用程序中的图像提取图像:

 File myFile = new File(filename);尝试 {//PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );PDDocument pdDoc = null;pdDoc = PDDocument.load( myFile );PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();PDAcroForm pdAcroForm = pdCatalog.getAcroForm();//dipakai untuk membaca isi 文件列表页面 = pdDoc.getDocumentCatalog().getAllPages();迭代器 iter = pages.iterator();while( iter.hasNext() ){PDPage 页面 = (PDPage)iter.next();PDResources 资源 = page.getResources();地图图像 = resources.getImages();如果(图像!= null){迭代器 imageIter = images.keySet().iterator();while( imageIter.hasNext() ){String key = (String )imageIter.next();PDXObjectImage image = (PDXObjectImage)images.get(key);BufferedImage imagedisplay= image.getRGBImage();jLabel5.setIcon(new ImageIcon(imagedisplay));//NOI18N}}}} 捕获(异常 e){JOptionPane.showMessageDialog(null, "error " + e.getMessage());}

但是我仍然无法读取按钮图像.顺便说一句,我从本页阅读了教程,将按钮图像添加到 pdf.

(存储为资源)像这样:

/*** 使用<a href="http://examples.itextpdf.com/results/part2/chapter08/buttons.pdf">buttons.pdf</a>进行测试*由<a href="http://itextpdf.com/examples/iia.php?id=154">part2.chapter08.Buttons</a>创建* 来自 ITEXT IN ACTION — 第二版.*/@测试public void testButtonsPdf() 抛出 IOException{试试 (InputStream resource = getClass().getResourceAsStream("buttons.pdf")){PDDocument 文档 = PDDocument.load(resource);extractAnnotationImages(document, new File(RESULT_FOLDER, "buttons%s.%s").toString());;}}

(来自

这里有两个问题:

  • 我们提取所有图像资源附加到注释外观,并且不检查它们是否在外观流中的任何位置使用.因此,您可能会发现比预期更多的图标.在上述情况下,第一个图像不用作单个资源,而仅用作第二个图像的掩码.
  • 我们只提取图片资源,不提取内嵌图片,因此可能会遗漏一些图片.

因此,请使用您的 PDF 检查此代码.如果需要,可以改进.

OP 的文件

OP 同时提供了一个示例文件

像这样调用上面的方法

/*** 使用<a href="http://www.docdroid.net/TDGVQzg/imageicon.pdf.html">imageicon.pdf</a>进行测试* 由 OP 创建.*/@测试public void testImageiconPdf() 抛出 IOException{试试 (InputStream resource = getClass().getResourceAsStream("imageicon.pdf")){PDDocument 文档 = PDDocument.load(resource);extractAnnotationImages(document, new File(RESULT_FOLDER, "imageicon%s.%s").toString());;}}

(来自

因此,它工作得很好!

作为独立工具开始

评论中指出的 OP

<块引用>

仍然会混淆使用 junit 测试方法,但是当我尝试将它调用到我的主程序中时,它总是返回流关闭"错误.我已经把这个文件和我的jar放在同一个目录下,也尝试手动给出路径,但仍然是同样的错误.

因此,我在类中添加了一个 main 方法以允许它

  1. 在没有 JUnit 框架的情况下启动并且
  2. 从本地文件系统中的任何地方提取 PDF,由命令行中的文件名指定.

在代码中:

public static void main(String[] args) 抛出 IOException{ExtractAnnotationImageTest 提取器 = 新的 ExtractAnnotationImageTest();for (String arg : args){试试 (PDDocument 文档 = PDDocument.load(arg)){extractor.extractAnnotationImages(document, arg+"%s.%s");;}}}

(来自 ExtractAnnotationImageTest.java)

I want to get image icon from button in pdf using java netbeans, and put it in some panel. However i hit a brick here. I'm using PDFBox as my PDF exporter, and i can't seem to understand enough. I already succeed reading from the form field, but there is no button extractor as long as i try to find it in PDFBox. How should i made it ? And is it possible using this method, or is there any other way around. Thanks in advance.

Edit : I already found to extractimages using the one that are in example utility using this code :

       File myFile = new File(filename);
        try { 

            //PDDocument pdDoc = PDDocument.loadNonSeq( myFile, null );
            PDDocument pdDoc = null;
            pdDoc = PDDocument.load( myFile );
            PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
            PDAcroForm pdAcroForm = pdCatalog.getAcroForm();
            // dipakai untuk membaca isi file

            List pages = pdDoc.getDocumentCatalog().getAllPages();
            Iterator iter = pages.iterator();
             while( iter.hasNext() )
             {
                 PDPage page = (PDPage)iter.next();
                 PDResources resources = page.getResources();
                 Map images = resources.getImages();
                 if( images != null )
                 {
                     Iterator imageIter = images.keySet().iterator();
                     while( imageIter.hasNext() )
                     {
                         String key = (String  )imageIter.next();
                         PDXObjectImage image = (PDXObjectImage)images.get(key);
                         BufferedImage imagedisplay= image.getRGBImage();
                         jLabel5.setIcon(new ImageIcon(imagedisplay)); // NOI18N                                 
                     }
                 }
             }


        } catch (Exception e) {
               JOptionPane.showMessageDialog(null, "error " + e.getMessage());


        }

However i still fail reading from the button images. Btw i read the tutorial from this page to add button images to pdf. https://acrobatusers.com/tutorials/how-to-create-a-button-form-field-to-insert-a-pdf-file
2nd Edit : Here i also give you the link to the pdf that has icon in it. PDF Link. Thank you in advance.

解决方案

I assume you mean interactive form buttons when you talk about buttons in PDFs.

In general

There is no explicit icon extractor for buttons in PDFBox. But as buttons (and annotations in general) with custom icons have these icons defined as part of their appearances, one can simply (recursively) traverse the resources of the appearances of the annotations and collect the XObjects with subtype Image:

public void extractAnnotationImages(PDDocument document, String fileNameFormat) throws IOException
{
    List<PDPage> pages = document.getDocumentCatalog().getAllPages();
    if (pages == null)
        return;

    for (int i = 0; i < pages.size(); i++)
    {
        String pageFormat = String.format(fileNameFormat, "-" + i + "%s", "%s");
        extractAnnotationImages(pages.get(i), pageFormat);
    }
}

public void extractAnnotationImages(PDPage page, String pageFormat) throws IOException
{
    List<PDAnnotation> annotations = page.getAnnotations();
    if (annotations == null)
        return;

    for (int i = 0; i < annotations.size(); i++)
    {
        PDAnnotation annotation = annotations.get(i);
        String annotationFormat = annotation.getAnnotationName() != null && annotation.getAnnotationName().length() > 0
                ? String.format(pageFormat, "-" + annotation.getAnnotationName() + "%s", "%s")
                : String.format(pageFormat, "-" + i + "%s", "%s");
        extractAnnotationImages(annotation, annotationFormat);
    }
}

public void extractAnnotationImages(PDAnnotation annotation, String annotationFormat) throws IOException
{
    PDAppearanceDictionary appearance = annotation.getAppearance();
    extractAnnotationImages(appearance.getDownAppearance(), String.format(annotationFormat, "-Down%s", "%s"));
    extractAnnotationImages(appearance.getNormalAppearance(), String.format(annotationFormat, "-Normal%s", "%s"));
    extractAnnotationImages(appearance.getRolloverAppearance(), String.format(annotationFormat, "-Rollover%s", "%s"));
}

public void extractAnnotationImages(Map<String, PDAppearanceStream> stateAppearances, String stateFormat) throws IOException
{
    if (stateAppearances == null)
        return;

    for (Map.Entry<String, PDAppearanceStream> entry: stateAppearances.entrySet())
    {
        String appearanceFormat = String.format(stateFormat, "-" + entry.getKey() + "%s", "%s");
        extractAnnotationImages(entry.getValue(), appearanceFormat);
    }
}

public void extractAnnotationImages(PDAppearanceStream appearance, String appearanceFormat) throws IOException
{
    PDResources resources = appearance.getResources();
    if (resources == null)
        return;
    Map<String, PDXObject> xObjects = resources.getXObjects();
    if (xObjects == null)
        return;

    for (Map.Entry<String, PDXObject> entry : xObjects.entrySet())
    {
        PDXObject xObject = entry.getValue();
        String xObjectFormat = String.format(appearanceFormat, "-" + entry.getKey() + "%s", "%s");
        if (xObject instanceof PDXObjectForm)
            extractAnnotationImages((PDXObjectForm)xObject, xObjectFormat);
        else if (xObject instanceof PDXObjectImage)
            extractAnnotationImages((PDXObjectImage)xObject, xObjectFormat);
    }
}

public void extractAnnotationImages(PDXObjectForm form, String imageFormat) throws IOException
{
    PDResources resources = form.getResources();
    if (resources == null)
        return;
    Map<String, PDXObject> xObjects = resources.getXObjects();
    if (xObjects == null)
        return;

    for (Map.Entry<String, PDXObject> entry : xObjects.entrySet())
    {
        PDXObject xObject = entry.getValue();
        String xObjectFormat = String.format(imageFormat, "-" + entry.getKey() + "%s", "%s");
        if (xObject instanceof PDXObjectForm)
            extractAnnotationImages((PDXObjectForm)xObject, xObjectFormat);
        else if (xObject instanceof PDXObjectImage)
            extractAnnotationImages((PDXObjectImage)xObject, xObjectFormat);
    }
}

public void extractAnnotationImages(PDXObjectImage image, String imageFormat) throws IOException
{
    image.write2OutputStream(new FileOutputStream(String.format(imageFormat, "", image.getSuffix())));
}

(from ExtractAnnotationImageTest.java)

Unfortunately the OP did not provide a sample PDF so I applied the code to this example file

(stored as a resource) like this:

/**
 * Test using <a href="http://examples.itextpdf.com/results/part2/chapter08/buttons.pdf">buttons.pdf</a>
 * created by <a href="http://itextpdf.com/examples/iia.php?id=154">part2.chapter08.Buttons</a>
 * from ITEXT IN ACTION — SECOND EDITION.
 */
@Test
public void testButtonsPdf() throws IOException
{
    try (InputStream resource = getClass().getResourceAsStream("buttons.pdf"))
    {
        PDDocument document = PDDocument.load(resource);
        extractAnnotationImages(document, new File(RESULT_FOLDER, "buttons%s.%s").toString());;
    }
}

(from ExtractAnnotationImageTest.java)

and got these images:

and

There are two issues here:

  • We extract all image resources attached to the annotation appearance and do not check whether they actually are used anywhere in the appearance stream. Thus, you might find more icons than expected. In the case above, the first image is not used as individual resource but only as mask for the second one.
  • We extract only image resources, not inline images, and so may miss some images.

Thus, please check this code with your PDFs. If need be, it can be improved.

The OP's file

The OP meanwhile has provided a sample file imageicon.pdf

Calling the methods above like this

/**
 * Test using <a href="http://www.docdroid.net/TDGVQzg/imageicon.pdf.html">imageicon.pdf</a>
 * created by the OP.
 */
@Test
public void testImageiconPdf() throws IOException
{
    try (InputStream resource = getClass().getResourceAsStream("imageicon.pdf"))
    {
        PDDocument document = PDDocument.load(resource);
        extractAnnotationImages(document, new File(RESULT_FOLDER, "imageicon%s.%s").toString());;
    }
}

(from ExtractAnnotationImageTest.java)

outputs this image:

Thus, it works just fine!

Starting as stand alone tool

The OP indicated in a comment to be

still confuse using junit testing method, however when i try to call it into my main program, it always return with "stream close" error. I already put the file as the same directory as my jar, also trying to give the path manually, but still the same error.

Thus, I added a main method to the class to allow it to

  1. be started without the JUnit framework and
  2. extract from PDFs anywhere in the local file system given by their file names on the command line.

In code:

public static void main(String[] args) throws IOException
{
    ExtractAnnotationImageTest extractor = new ExtractAnnotationImageTest();

    for (String arg : args)
    {
        try (PDDocument document = PDDocument.load(arg))
        {
            extractor.extractAnnotationImages(document, arg+"%s.%s");;
        }
    }
}

(from ExtractAnnotationImageTest.java)

这篇关于如何使用 Apache PDFBox 从 PDF 中的按钮图标中提取图像?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆