PDFBox 未返回正确的图像大小 [英] PDFBox not returning the correct size of an image

查看:73
本文介绍了PDFBox 未返回正确的图像大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 PDFBox 的新手,一直在寻找以英寸为单位的图像高度.经过几次搜索,这是我正在使用的代码:

I am new to PDFBox and am stuck at finding the height of an image in inches. After a couple of searches, this is the piece of code that I am working with:

PDResources resources = aPdPage.findResources();
        graphicsState = new PDGraphicsState(aPdPage.findCropBox());
        pageWidth = aPdPage.findCropBox().getWidth() / 72;
        pageHeight = aPdPage.findCropBox().getHeight() / 72;
        @SuppressWarnings("deprecation")
        Map<String, PDXObjectImage> imageObjects = resources.getImages();
        if (null == imageObjects || imageObjects.isEmpty())
            return;
        for (Map.Entry<String, PDXObjectImage> entryxObjects : imageObjects.entrySet()) {

            PDXObjectImage image = entryxObjects.getValue();
        //  System.out.println("bits per component: " + image.getBitsPerComponent());
            Matrix ctmNew = graphicsState.getCurrentTransformationMatrix();
            float imageXScale = ctmNew.getXScale();
            float imageYScale = ctmNew.getYScale();
            System.out.println("position = " + ctmNew.getXPosition() + ", " + ctmNew.getYPosition());
            // size in pixel
            System.out.println("size = " + image.getWidth() + "px, " + image.getHeight() + "px");
            // size in page units
            System.out.println("size = " + imageXScale + "pu, " + imageYScale + "pu");
            // size in inches 
            imageXScale /= 72;
            imageYScale /= 72;
            System.out.println("size = " + imageXScale + "in, " + imageYScale + "in");
            // size in millimeter
            imageXScale *= 25.4;
            imageYScale *= 25.4;
            System.out.println("size = " + imageXScale + "mm, " + imageYScale + "mm");

            System.out.printf("dpi  = %.0f dpi (X), %.0f dpi (Y) %n", image.getWidth() * 72 / ctmNew.getXScale(), image.getHeight() * 72 / ctmNew.getYScale());

        }

但该值以英寸为单位不正确.pu 中的 imageXScale 值将始终为 0.1.

But the value is not coming correctly in inches. The imageXScale value in pu is coming to be 0.1 always.

任何帮助将不胜感激.

推荐答案

首先你需要知道位图图像在 PDF 中通常是如何使用的:

First of all you need to know how bitmap images usually are used in PDFs:

在 PDF 中,页面对象具有一组所谓的资源,其中包括位图图像资源、字体资源......

In a PDF a page object has a collection of so called resources, among them bitmap image resources, font resources, ...

您可以像现在一样检查这些资源:

You can inspect these resources like you currently do:

PDResources resources = aPdPage.findResources();
@SuppressWarnings("deprecation")
Map<String, PDXObjectImage> imageObjects = resources.getImages();
if (null == imageObjects || imageObjects.isEmpty())
    return;
for (Map.Entry<String, PDXObjectImage> entryxObjects : imageObjects.entrySet())
{
    PDXObjectImage image = entryxObjects.getValue();
    System.out.println("size = " + image.getWidth() + "px, " + image.getHeight() + "px");
}

但这只会为您提供图像的像素尺寸,因为它们在页面资源中可用.

But this only gives you the pixel dimension of the images as they are available in the page resources.

当这样的资源被绘制到页面上时,执行此操作的操作实际上首先将其缩小到一个 1x1 单位的正方形并绘制这个缩小版本.

When such an resource is painted onto the page, the operation doing this actually first scales it down to a 1x1 unit square and paints this scaled down version.

您在屏幕上和纸上的图像尺寸合理的原因是,绘画操作员在 PDF 中的工作方式受到所谓的当前图形状态的影响.该图形状态包含诸如当前填充颜色线宽等信息...特别是它还包含所谓的当前变换矩阵它定义了某些操作绘制的所有内容应如何拉伸、旋转、倾斜、平移、...转换.

The reason why you on screen and on paper have images of reasonable size, is that the way painting operators work in PDFs is influenced by the so called current graphics state. This graphics state contains information like the current fill color, line widths, etc... In particular it also contains the so called current transformation matrix which defines how everything some operation draws shall be stretched, rotated, skewed, translated, ... transformed.

绘制位图图像时通常的操作顺序如下:

The usual sequence of operations when drawing a bitmap image looks like this:

  • ...
  • 存储当前图形状态的临时副本,
  • 通过缩放变换更改当前变换矩阵,该变换将 x 坐标乘以所需的宽度,将 y 坐标乘以所需的高度要绘制的图像,
  • 绘制资源中引用的图像,以及
  • 恢复当前图形状态到临时存储的值,
  • ...
  • ...
  • store a temporary copy of the current graphics state,
  • change the current transformation matrix by a scaling transformation which multiplies the x coordinate by the desired widths and the y coordinate by the desired height of the image to draw,
  • draw the image referenced in the resources, and
  • restore the current graphics state to the temporarily stored values,
  • ...

因此,要知道页面上图像的尺寸,您必须知道当前的变换矩阵在执行图像绘制操作时.

Thus, to know the dimensions of the image on the page, you have to know the current transformation matrix as it is when the image drawing operation is executed.

另一方面,您的代码使用来自新实例化的图形状态的当前变换矩阵,所有值都为默认值.因此,您的代码会打印有关如何在页面上缩放图像的错误信息.

Your code, on the other hand, uses the current transformation matrix from a freshly instantiated graphics state with all values at defaults. Thus, your code prints the false information on how the image is scaled on the page.

要获得正确的信息,您必须解析为创建文档页面而执行的操作序列.

To get the correct information, you have to parse the sequence of operations executed for creating the document page.

这正是 PDFBox PrintImageLocations 示例:它处理页面内容流(包含所有这些操作),更新当前图形状态值的副本,以及何时它看到一个绘制位图图像的操作,它使用当前变换矩阵的值:

This is exactly what the PDFBox PrintImageLocations example does: It processes the page content stream (which contains all those operations), updating a copy of the values of the current graphics state, and when it sees an operation for drawing a bitmap image, it uses the value of the current transformation matrix at that very moment:

protected void processOperator( PDFOperator operator, List arguments ) throws IOException
{
    String operation = operator.getOperation();
    if( INVOKE_OPERATOR.equals(operation) )
    {
        COSName objectName = (COSName)arguments.get( 0 );
        Map<String, PDXObject> xobjects = getResources().getXObjects();
        PDXObject xobject = (PDXObject)xobjects.get( objectName.getName() );
        if( xobject instanceof PDXObjectImage )
        {
            PDXObjectImage image = (PDXObjectImage)xobject;
            PDPage page = getCurrentPage();
            int imageWidth = image.getWidth();
            int imageHeight = image.getHeight();
            double pageHeight = page.getMediaBox().getHeight();
            System.out.println("*******************************************************************");
            System.out.println("Found image [" + objectName.getName() + "]");

            Matrix ctmNew = getGraphicsState().getCurrentTransformationMatrix();
            ...
            [calculate dimensions and rotation of image on page]
            ... 

因此,对于您的任务,PDFBox 示例应该是一个很好的起点.

Thus, for your task that PDFBox example should be a good starting point.

这篇关于PDFBox 未返回正确的图像大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆