使用iText从PDF坐标中提取图像 [英] Extract Images from PDF coordinates using iText
问题描述
我找到了一些示例,了解如何使用iText从PDF中提取图像。但我正在寻找的是通过坐标从PDF获取图像。
I found some examples for how to extract images from PDF using iText. But what I am looking for is to get the images from PDF by coordinates.
有可能吗?如果是,那么如何做。
Is it possible? If yes then how it can be done.
推荐答案
沿着iText示例 ExtractImages 您可以提取如下代码:
Along the lines of the iText example ExtractImages you can extract code like this:
PdfReader reader = new PdfReader(resourceStream);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
ImageRenderListener listener = new ImageRenderListener("testpdf");
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
}
ImageRenderListener
是定义如下:
class ImageRenderListener implements RenderListener
{
final String name;
int counter = 100000;
public ImageRenderListener(String name)
{
this.name = name;
}
public void beginTextBlock() { }
public void renderText(TextRenderInfo renderInfo) { }
public void endTextBlock() { }
public void renderImage(ImageRenderInfo renderInfo)
{
try
{
PdfImageObject image = renderInfo.getImage();
if (image == null) return;
int number = renderInfo.getRef() != null ? renderInfo.getRef().getNumber() : counter++;
String filename = String.format("%s-%s.%s", name, number, image.getFileType());
FileOutputStream os = new FileOutputStream(filename);
os.write(image.getImageAsBytes());
os.flush();
os.close();
PdfDictionary imageDictionary = image.getDictionary();
PRStream maskStream = (PRStream) imageDictionary.getAsStream(PdfName.SMASK);
if (maskStream != null)
{
PdfImageObject maskImage = new PdfImageObject(maskStream);
filename = String.format("%s-%s-mask.%s", name, number, maskImage.getFileType());
os = new FileOutputStream(filename);
os.write(maskImage.getImageAsBytes());
os.flush();
os.close();
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
如您所见 ImageRenderListener
方法 renderImage
检索参数 ImageRenderInfo
。这个参数有方法
As you see the ImageRenderListener
method renderImage
retrieves an argument ImageRenderInfo
. This arguments has methods
-
getStartPoint
在用户空间中给你一个向量表示xobject的起点和 -
getImageCTM
为此图像提供坐标转换矩阵被渲染了。坐标位于用户空间中。
getStartPoint
giving you a vector in User space representing the start point of the xobject andgetImageCTM
giving you the coordinate transformation matrix active when this image was rendered. Coordinates are in User space.
后者为您提供在1x1用户空间单位平方上精确操作的信息用来实际绘制图像。如您所知,图像可以旋转,拉伸,倾斜和移动(前一种方法实际上从移动信息中提取矩阵的结果)。
The latter gives you the information which exact manipulation on a 1x1 user space unit square are used to actually draw the image. As you are aware, an image may be rotated, stretched, skewed, and moved (the former method actually extracts its result from the matrix from the "moved" information).
这篇关于使用iText从PDF坐标中提取图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!