文字方向和页面旋转调整坐标有什么区别? [英] What's the difference between text direction and page rotation adjusted coordinates?

查看:94
本文介绍了文字方向和页面旋转调整坐标有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TextPosition 类引用的页面旋转调整坐标"和文本方向调整坐标"有什么区别?直观的解释可能是最好的.

What's the difference between "page rotation adjusted coordinates" and "text direction adjusted coordinates" as referenced by the TextPosition class? A visual explanation is probably best.

推荐答案

PDFBox 文本提取允许在这两个坐标系中轻松访问文本字形的坐标:

PDFBox text extraction allows easy access to coordinates of a text glyph in those two coordinate systems:

每个 PDF 页面都可以有一个 Rotate 属性,允许的值为 0、90、180 和 270.PDF 查看器通常会获取页面内容并按该角度旋转显示它们.例如.您查看的横向页面可能实际上具有横向格式页面大小,也可能实际上具有纵向格式页面大小和 旋转 值 90 或 270.

Each PDF page can have a Rotate property, allowed values are 0, 90, 180, and 270. A PDF viewer usually takes the page contents and displays them rotated by that angle. E.g. a landscape page you view may either actually have the landscape format page size or it may actually have the portrait format page size and a Rotate value of 90 or 270.

页面旋转调整坐标系是考虑页面根据其旋转值旋转的坐标系,原点在页面左上角x坐标向右增加,y坐标向下增加.

The page rotation adjusted coordinate system is the coordinate system considering the page to be rotated according to its Rotate value, having the origin in the top left corner of the page with x coordinates increasing rightwards and y coordinates increasing downwards.

可以以任意角度绘制每个文本字形(通过当前变换矩阵和文本矩阵).

Each text glyph can be drawn at an arbitrary angle (by means of the current transformation matrix and the text matrix).

给定文本字形的文本方向调整坐标系是考虑要旋转页面(90°的倍数)的坐标系,以便文本字形垂直绘制或至少(因为字形可以以任意角度绘制,而不仅仅是 90° 的倍数)尽可能直立,原点位于页面左上角,x 坐标向右增加,y 坐标向下增加.

The text direction adjusted coordinate system for a given text glyph is the coordinate system considering the page to be rotated (by a multiple of 90°) so that the text glyph is drawn upright or at least (as the glyph may be drawn at an arbitrary angle, not merely by multiples of 90°) as upright as possible, having the origin in the top left corner of the page with x coordinates increasing rightwards and y coordinates increasing downwards.

通常排列文档页面上的文本,以便最终(考虑页面旋转)直立显示.因此,通常这两个坐标系中每个字形的坐标会重合(或至少几乎重合 - 因为它们的计算方式不同,由于 float 不准确,可能会存在细微差异).

Usually text on document pages is arranged so that it eventually (with the page rotation considered) is displayed upright. Thus, usually the coordinates for each glyph in those two coordinate systems will coincide (or at least nearly so - as they are calculated differently, there might be small differences due to float inaccuracies).

对于最终不会直立显示的文本(例如,考虑以直角绘制窄表格列的标题的情况),您可能更喜欢一种或另一种系统,具体取决于您尝试实现的目标:

For text which eventually is not displayed upright (e.g. consider the case of headers of narrow table columns being drawn at a right angle) you might prefer one or the other system, depending on what you try to achieve:

  • 如果要比较任意字形相对的位置,显然需要一个共同的坐标系,所以不能使用文字方向调整后的坐标;对于这种情况,PDFBox 选择了页面旋转调整坐标系.

  • If you want to compare the position of arbitrary glyphs relative to each other, you obviously need a common coordinate system, so the text direction adjusted coordinates cannot be used; for this case PDFBox chose the page rotation adjusted coordinate system.

如果您想检查具有相同文本绘制方向的两个字形是否彼此相邻,即可能形成一个单词的(一部分),则最好测试它们的文本方向调整坐标.

If you want to check if two glyphs with the same text drawing direction are next to each other, i.e. probably forming (a part of) a word, testing their text direction adjusted coordinates may be preferable.

(实际上,根据我的经验,这些坐标系都不是文本提取后期处理所需的坐标系;通常您需要未旋转的默认 PDF 页面用户空间坐标系中的坐标,例如用一些标记注释覆盖它们.要得到他们必须从关联的文本矩阵中获取翻译值,然后对它们进行反规范化...)

(Actually in my experience neither of those coordinate systems is the one you need for text extraction post processing; often you need the coordinates in the unrotated default PDF page user space coordinate system to e.g. overlay them with some markup annotation. To get them one has to get the translation values from the associated text matrix and then de-normalize them...)

这篇关于文字方向和页面旋转调整坐标有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆