使用Itext从pdf检索图像时出错 [英] Error while retrieving images from pdf using Itext

查看:289
本文介绍了使用Itext从pdf检索图像时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 PDF ,我想从中检索图像

I have an existing PDF from which I want to retrieve images

注意:

在文档中,这是 RESULT 变量

In the Documentation, this is the RESULT variable

public static final String RESULT = "results/part4/chapter15/Img%s.%s";

我不明白为什么需要这个图片?我只想从我的<$中提取图像c $ c> PDF file

I am not getting why this image is needed?I just want to extract the images from my PDF file

所以现在当我使用 MyImageRenderListener listener = new MyImageRenderListener(RESULT);

我收到错误消息:


results\part4\chapter15\Img16.jpg(系统
找不到指定的路径)

results\part4\chapter15\Img16.jpg (The system cannot find the path specified)

这是我的代码。

    package part4.chapter15;

    import java.io.IOException;


    import com.itextpdf.text.DocumentException;
    import com.itextpdf.text.pdf.PdfReader;
    import com.itextpdf.text.pdf.parser.PdfReaderContentParser;

    /**
     * Extracts images from a PDF file.
     */
    public class ExtractImages {

    /** The new document to which we've added a border rectangle. */
    public static final String RESOURCE = "resources/pdfs/samplefile.pdf";
    public static final String RESULT = "results/part4/chapter15/Img%s.%s";
    /**
     * Parses a PDF and extracts all the images.
     * @param src the source PDF
     * @param dest the resulting PDF
     */
    public void extractImages(String filename)
        throws IOException, DocumentException {
        PdfReader reader = new PdfReader(filename);
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        MyImageRenderListener listener = new MyImageRenderListener(RESULT);
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
            parser.processContent(i, listener);
        }
        reader.close();
    }

    /**
     * Main method.
     * @param    args    no arguments needed
     * @throws DocumentException 
     * @throws IOException
     */
    public static void main(String[] args) throws IOException, DocumentException {
        new ExtractImages().extractImages(RESOURCE);
    }
}


推荐答案

你有两个问题,第一个问题的答案是第二个答案的答案。

You have two questions and the answer to the first question is the key to the answer of the second.

问题1:

您可以参考:

public static final String RESULT = "results/part4/chapter15/Img%s.%s";

您问:为什么需要这张图片?

这个问题是错误的,因为 Img%s。%s 不是图像的文件名,它是文件名的模式一个图像。在解析时,iText将检测PDF中的图像。这些图像存储在编号对象(例如对象16)中,这些图像可以以不同的格式导出(例如jpg,png,...)。

That question is wrong, because Img%s.%s is not a filename of an image, it's a pattern of the filename of an image. While parsing, iText will detect images in the PDF. These images are stored in numbered objects (e.g. object 16) and these images can be exported in different formats (e.g. jpg, png,...).

假设一个图像存储在对象16中,并且该图像是jpg,然后该图案将解析为 Img16.jpg

Suppose that an image is stored in object 16 and that this image is a jpg, then the pattern will resolve to Img16.jpg.

问题2:

为什么会出现错误:


results\part4\chapter15\Img16.jpg(系统找不到指定的路径)

results\part4\chapter15\Img16.jpg (The system cannot find the path specified)

在您的PDF中,有一个jpg存储在对象16中。您要求iText使用以下路径存储该图像: results\part4\chapter15\Img16.jpg (正如我对问题1 的回答中所解释的那样。但是:您的工作目录没有子目录 results\part4\chapter15 \ ,因此 IOException 抛出(或 FileNotFoundException ?)。

In your PDF, there's a jpg stored in object 16. You are asking iText to store that image using this path: results\part4\chapter15\Img16.jpg (as explained in my answer to Question 1). However: you working directory doesn't have the subdirectories results\part4\chapter15\, hence an IOException (or a FileNotFoundException?) is thrown.

一般问题是什么?

您已复制/粘贴 ExtractImages 我为我的书iText in Action - Second Edition写的例子,但是:

You have copy/pasted the ExtractImages example I wrote for my book "iText in Action - Second Edition", but:


  1. 你没看过因为你不知道该代码应该做什么。

  2. 你没有告诉读者StackOverflow这个例子取决于 MyImageRenderer 类,这是所有魔法发生的地方。

  1. You didn't read that book, so you have no idea what that code is supposed to do.
  2. You aren't telling the readers on StackOverflow that this example depends on the MyImageRenderer class, which is where all the magic happens.

如何解决问题?

选项1:

更改结果,如下所示:

public static final String RESULT = "Img%s.%s";

现在图像将存储在您的工作目录中。

Now the images will be stored in your working directory.

选项2:

调整 MyImageRenderer 类,更具体地说是这个方法:

Adapt the MyImageRenderer class, more specifically this method:

public void renderImage(ImageRenderInfo renderInfo) {
    try {
        String filename;
        FileOutputStream os;
        PdfImageObject image = renderInfo.getImage();
        if (image == null) return;
        filename = String.format(path,
            renderInfo.getRef().getNumber(), image.getFileType());
        os = new FileOutputStream(filename);
        os.write(image.getImageAsBytes());
        os.flush();
        os.close();
    } catch (IOException e) {
        System.out.println(e.getMessage());
    }
}

每当遇到图像时,iText都会调用此类。它将 ImageRenderInfo 传递给此方法,其中包含大量有关该图像的信息。

iText calls this class whenever an image is encountered. It passed an ImageRenderInfo to this method that contains plenty of information about that image.

在此实现中,我们存储图像字节作为文件。这是我们创建该文件路径的方式:

In this implementation, we store the image bytes as a file. This is how we create the path to that file:

String.format(path,
     renderInfo.getRef().getNumber(), image.getFileType())

如您所见,模式存储在 RESULT 的使用方式是第一次出现%s 替换为数字,第二次出现时用文件扩展名。

As you can see, the pattern stored in RESULT is used in such a way that the first occurrence of %s is replaced with a number and the second occurrence with a file extension.

您可以轻松地调整此方法,以便将图像存储为 byte [] code>列表如果这是你想要的。

You could easily adapt this method so that it stores the images as byte[] in a List if that is what you want.

这篇关于使用Itext从pdf检索图像时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆