在 PDFBox 中,如何更改 PDRectangle 对象的原点 (0,0) 点? [英] In PDFBox, how to change the origin (0,0) point of a PDRectangle object?

查看:75
本文介绍了在 PDFBox 中,如何更改 PDRectangle 对象的原点 (0,0) 点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

情况:
在 PDFBox 中,PDRectangle 对象的默认原点 (0,0) 似乎是页面的左下角.

The Situation:
In PDFBox, PDRectangle objects' default origin (0,0) seems to be the lower-left corner of a page.

例如,以下代码在页面的左下角为您提供一个正方形,每边长 100 个单位.

For example, the following code gives you a square at the lower-left corner of a page, and each side is 100 units long.

PDRectangle rectangle = new PDRectangle(0, 0, 100, 100);

问题:
是否可以将原点更改为左上角,例如,上面的代码将在页面的左上角为您提供相同的正方形?

The Question:
Is it possible to change the origin to the UPPER-LEFT corner, so that, for example, the code above will give you the same square at the UPPER-LEFT corner of the page instead?

我问的原因:
我正在使用 PDFTextStripper 来获取文本的坐标(通过使用提取的 TextPosition 对象的 getX() 和 getY() 方法).从 TextPosition 对象检索的坐标似乎在左上角有一个原点 (0,0).我希望我的 PDRectangle 对象的坐标与我的 TextPosition 对象的坐标具有相同的原点.

The reason I ask:
I was using PDFTextStripper to get the coordinates of the text (by using the getX() and getY() methods of the extracted TextPosition objects). The coordinates retrieved from TextPosition objects seem have an origin (0,0) at the UPPER-LEFT CORNER. I want the coordinates of my PDRectangle objects have the same origin as the coordinates of my TextPosition objects.

我试图通过页面高度减去 Y 坐标"来调整我的 PDRectangle 的 Y 坐标.这给了我想要的结果,但它并不优雅.我想要一个优雅的解决方案.

I have tried to adjust the Y-coordinates of my PDRectangle by "page height minus Y-coordinate". This gives me the desired result, but it's not elegant. I want an elegant solution.

注意:有人问过类似的问题.答案是我试过的,不是最优雅的.如何将pdf页面中文本的坐标从左下角更改为左上角

Note: Someone has asked a similar question. The answer is what I tried, which is not the most elegant. how to change the coordiantes of a text in a pdf page from lower left to upper left

推荐答案

你可以稍微改变坐标系,但很可能最终事情不会变得更优雅.

You can change coordinate systems somewhat but most likely things won't get more elegant in the end.

首先让我们澄清一些误解:

First of all let's clear up some misconception:

你假设

在 PDFBox 中,PDRectangle 对象的默认原点 (0,0) 似乎是页面的左下角.

并非所有情况都如此,只是经常如此.

This is not true for all cases, merely often.

包含显示页面区域(在纸上或屏幕上)的区域通常由相关页面的 CropBox 条目定义:

The area containing the displayed page area (on paper or on screen) usually is defined by the CropBox entry of the page in question:

CropBox 矩形 (可选;可继承) 一个矩形,以默认用户空间单位表示,用于定义默认用户空间的可见区域.当页面被显示或打印时,它的内容应该被剪切(裁剪)到这个矩形,然后应该以某种实现定义的方式施加到输出介质上.

CropBox rectangle (Optional; inheritable) A rectangle, expressed in default user space units, that shall define the visible region of default user space. When the page is displayed or printed, its contents shall be clipped (cropped) to this rectangle and then shall be imposed on the output medium in some implementation-defined manner.

... 正 x 轴水平向右延伸,正 y 轴垂直向上,与标准数学实践相同(根据页面字典中的 旋转 条目进行更改).

... The positive x axis extends horizontally to the right and the positive y axis vertically upward, as in standard mathematical practice (subject to alteration by the Rotate entry in the page dictionary).

... 在 PostScript 中,默认用户空间的原点始终对应于输出介质的左下角.虽然此约定在 PDF 文档中也很常见,但不是必需的;页面字典的 CropBox 条目可以指定要在媒体上可见的默认用户空间的任何矩形.

... In PostScript, the origin of default user space always corresponds to the lower-left corner of the output medium. While this convention is common in PDF documents as well, it is not required; the page dictionary’s CropBox entry can specify any rectangle of default user space to be made visible on the medium.

因此,原点 (0,0) 可以在任何地方,它可能在左下角、左上角、页面中间甚至远远超出显示的页面区域.

Thus, the origin (0,0) can literally be anywhere, it may be at the lower left, at the upper left, in the middle of the page or even far outside the displayed page area.

并且通过旋转条目,该区域甚至可以旋转(90°、180°或270°).

And by means of the Rotate entry, that area can even be rotated (by 90°, 180°, or 270°).

将原点(正如您似乎已经观察到的)放在左下方只是按照惯例进行的.

Putting the origin (as you seem to have observed) in the lower left merely is done by convention.

此外,您似乎认为坐标系是恒定的.事实也并非如此,您可以通过一些操作彻底改变用户空间坐标系,您可以平移、旋转、镜像、倾斜和/或缩放它!

Furthermore you seem to think that the coordinate system is constant. This also is not the case, there are operations by which you can transform the user space coordinate system drastically, you can translate, rotate, mirror, skew, and/or scale it!

因此,即使一开始坐标系是通常的坐标系,原点在左下,x 轴向右,y 轴向上,它可能会以某种方式更改为页面内容描述中的某种奇怪的东西.绘制矩形 new PDRectangle(0, 0, 100, 100) 可能会在页面中心的右侧产生一些菱形.

Thus, even if at the beginning the coordinate system is the usual one, origin in lower left, x-axis going right, y-axis going up, it may be changed to something weird some way into the page content description. Drawing your rectangle new PDRectangle(0, 0, 100, 100) there might produce some rhomboid form just right of the page center.

正如您所见,PDF 用户空间中的坐标是一个非常动态的问题.你可以做些什么来控制这种情况,取决于你使用矩形的上下文.

As you see coordinates in PDF user space are a very dynamic matter. what you can do to tame the situation, depends on the context you use your rectangle in.

不幸的是,您对自己所做工作的描述非常含糊.因此,这也会有些模糊.

Unfortunately you were quite vague in the description of what you do. Thus, this will be somewhat vague, too.

如果你想在现有的页面上绘制一些矩形,你首先需要一个页面内容流来写入,即一个 PDPageContentStream 实例,并且它应该以保证的方式准备原始用户空间坐标系没有受到干扰.您可以通过使用带有三个布尔参数的构造函数将它们全部设置为 true 来获得这样的实例:

If you want to draw some rectangle on an existing page, you first of all need a page content stream to write to, i.e. a PDPageContentStream instance, and it should be prepared in a manner guaranteeing that the original user space coordinate system has not been disturbed. You get such an instance by using the constructor with three boolean arguments setting all them to true:

PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true, true);

然后您可以对坐标系应用变换.您希望左上角为原点,y 值向下增加.如果页面的裁剪框告诉你左上角有坐标(xtl,ytl),那么你申请

Then you can apply a transformation to the coordinate system. You want the top left to be the origin and the y-value increasing downwards. If the crop box of the page tells you the top left has coordinates (xtl, ytl), therefore, you apply

contentStream.concatenate2CTM(new AffineTransform(1, 0, 0, -1, xtl, ytl));

从这里开始,您就有了一个您想要的坐标系,原点左上角和镜像的 y 坐标.

and from here on you have a coordinate system you wanted, origin top left and y coordinates mirrored.

但是请注意一件事:如果您也打算绘制文本,那么不仅文本插入点 y 坐标会被镜像,而且文本本身也会被镜像,除非您通过添加一个同样镜像的文本矩阵来抵消这一点!因此,如果您想添加大量文本,这可能不会像您想要的那样优雅.

Be aware of one thing, though: If you are going to draw text, too, not only the text insertion point y coordinate is mirrored but also the text itself unless you counteract that by adding an also mirroring text matrix! If you want to add much text, therefore, this may not be as elegant as you want.

如果您不想在内容流中使用矩形而是添加注释,则您不受上述转换的约束,但也无法使用它.

If you don't want to use the rectangle in the content stream but instead for adding annotations, you are not subject to the transformations mentioned above but you can not make use of it, either.

因此,在这种情况下,您必须按原样获取裁剪框并相应地变换矩形.

Thus, in this context you have to take the crop box as it is and transform your rectangle accordingly.

本质上,为了以正确的顺序将文本行放在一起并正确排序行,您不想要这种奇怪的情况,而是需要一个简单的稳定坐标系.一些 PDFBox 开发人员为此选择了左上角原点、y 递增向下变体,因此 TextPosition 坐标已标准化为该方案.

Essentially for putting lines of text together in the right order and sorting the lines correctly, you don't want such a weird situation but instead a simple stable coordinate system. Some PDFBox developers chose the top-left-origin, y-increasing-downwards variant for that, and so the TextPosition coordinates have been normalized to that scheme.

在我看来,更好的选择是使用默认的用户空间坐标,以便更容易地重用坐标.因此,您可能想尝试使用 textPosition.getTextMatrix().getTranslateX()textPosition.getTextMatrix().getTranslateY() 来处理 TextPosition textPosition

In my opinion a better choice would have been to use the default user space coordinates for easier re-use of the coordinates. You might, therefore, want to try working with textPosition.getTextMatrix().getTranslateX(), textPosition.getTextMatrix().getTranslateY() for a TextPosition textPosition

这篇关于在 PDFBox 中,如何更改 PDRectangle 对象的原点 (0,0) 点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆