使用iText从PDF坐标中提取图像 [英] Extract Images from PDF coordinates using iText

查看:2837
本文介绍了使用iText从PDF坐标中提取图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找到了一些示例,了解如何使用iText从PDF中提取图像。但我正在寻找的是通过坐标从PDF获取图像。

I found some examples for how to extract images from PDF using iText. But what I am looking for is to get the images from PDF by coordinates.

有可能吗?如果是,那么如何做。

Is it possible? If yes then how it can be done.

推荐答案

沿着iText示例 ExtractImages 您可以提取如下代码:

Along the lines of the iText example ExtractImages you can extract code like this:

PdfReader reader = new PdfReader(resourceStream);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
ImageRenderListener listener = new ImageRenderListener("testpdf");

for (int i = 1; i <= reader.getNumberOfPages(); i++) {
    parser.processContent(i, listener);
}

ImageRenderListener 是定义如下:

class ImageRenderListener implements RenderListener
{
    final String name;
    int counter = 100000;

    public ImageRenderListener(String name)
    {
        this.name = name;
    }

    public void beginTextBlock() { }
    public void renderText(TextRenderInfo renderInfo) { }
    public void endTextBlock() { }

    public void renderImage(ImageRenderInfo renderInfo)
    {
        try
        {
            PdfImageObject image = renderInfo.getImage();
            if (image == null) return;
            int number = renderInfo.getRef() != null ? renderInfo.getRef().getNumber() : counter++;
            String filename = String.format("%s-%s.%s", name, number, image.getFileType());
            FileOutputStream os = new FileOutputStream(filename);
            os.write(image.getImageAsBytes());
            os.flush();
            os.close();

            PdfDictionary imageDictionary = image.getDictionary();
            PRStream maskStream = (PRStream) imageDictionary.getAsStream(PdfName.SMASK);
            if (maskStream != null)
            {
                PdfImageObject maskImage = new PdfImageObject(maskStream);
                filename = String.format("%s-%s-mask.%s", name, number, maskImage.getFileType());
                os = new FileOutputStream(filename);
                os.write(maskImage.getImageAsBytes());
                os.flush();
                os.close();
            }
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
    }
}

如您所见 ImageRenderListener 方法 renderImage 检索参数 ImageRenderInfo 。这个参数有方法

As you see the ImageRenderListener method renderImage retrieves an argument ImageRenderInfo. This arguments has methods


  • getStartPoint 在用户空间中给你一个向量表示xobject的起点

  • getImageCTM 为此图像提供坐标转换矩阵被渲染了。坐标位于用户空间中。

  • getStartPoint giving you a vector in User space representing the start point of the xobject and
  • getImageCTM giving you the coordinate transformation matrix active when this image was rendered. Coordinates are in User space.

后者为您提供在1x1用户空间单位平方上精确操作的信息用来实际绘制图像。如您所知,图像可以旋转,拉伸,倾斜和移动(前一种方法实际上从移动信息中提取矩阵的结果)。

The latter gives you the information which exact manipulation on a 1x1 user space unit square are used to actually draw the image. As you are aware, an image may be rotated, stretched, skewed, and moved (the former method actually extracts its result from the matrix from the "moved" information).

这篇关于使用iText从PDF坐标中提取图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆