使用 PDFBox 从单个 PDF 页面中提取多个嵌入图像 [英] Extract Multiple Embedded Images from a single PDF Page using PDFBox

查看:55
本文介绍了使用 PDFBox 从单个 PDF 页面中提取多个嵌入图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

朋友们,我使用的是 PDFBox 2.0.6.我已经成功地从 pdf 文件中提取图像,但现在它正在为单个 pdf 页面创建图像.但问题是可以有任何不.pdf 页面中的图像,我希望每个嵌入的图像都应该被提取为单个图像本身.

Friends, I am using PDFBox 2.0.6. I have been successfull in extracting images from the pdf file, But right now it is creating an image for single pdf page. But the issue is that there can be any no. of images in a pdf page, And I want that each embedded image should be extracted as a single image itself.

这是代码,

import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;

public class DemoPdf {

    public static void main(String args[]) throws Exception {
        //Loading an existing PDF document
        File file = new File("C:/Users/ADMIN/Downloads/Vehicle_Photographs.pdf");
        PDDocument document = PDDocument.load(file);
        //Instantiating the PDFRenderer class
        PDFRenderer renderer = new PDFRenderer(document);
        File imageFolder = new File("C:/Users/ADMIN/Desktop/image");

        for (int page = 0; page < document.getNumberOfPages(); ++page) {
            //Rendering an image from the PDF document
            BufferedImage image = renderer.renderImage(page);
            //Writing the image to a file
            ImageIO.write(image, "JPEG", new File(imageFolder+"/" + page +".jpg"));
            System.out.println("Image created"+ page);
        }
        //Closing the document
        document.close();
    }

}   

是否可以在 PDFBox 中将所有嵌入的图像提取为单独的图像,谢谢

Is it possible in PDFBox that I can extract all embedded images as separate images, Thanks

推荐答案

是的.可以从pdf中的所有页面中提取所有图像.

Yes. It is possible to extract all images from all the pages in pdf.

你可以参考这个链接,从pdf中提取图片使用 PDFBox.

You may refer this link, extract images from pdf using PDFBox.

这里的基本思想是,使用 PDFStreamEngine 扩展类,并覆盖 processOperator 方法.为所有页面调用 PDFStreamEngine.processPage.而如果传递给processOperator的对象是Image Object,则从该对象中获取BufferedImage,并保存.

The basic idea here is that, extend the class with PDFStreamEngine, and override processOperator method. Call PDFStreamEngine.processPage for all the pages. And if the object that has been passed to processOperator is an Image Object, get BufferedImage from the object, and save it.

这篇关于使用 PDFBox 从单个 PDF 页面中提取多个嵌入图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆