如何从pdf的转换矩阵中获取文本的高度? [英] How can I get text's heigth from pdf's transformation matrix?

查看:94
本文介绍了如何从pdf的转换矩阵中获取文本的高度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在制作pdf解析器,尝试读取文本的转换矩阵(Tm)时遇到问题.

例如,当我有一个水平文本时,转换矩阵如下所示:
"71.9871 0 0 73.5 178.668 522.2227 Tm"这意味着文本的高度是d参数(73.5),每个字符的比率是a/d(71.9871/73.5),必须将其转换为点(178.668 522.2227).

如果我旋转此文本,则转换矩阵如下所示:
"63.1614 -34.5367 35.2625 64.4888 181.8616 575.8494 Tm"

如何获取文本的高度73.5?

如果导出与svg文件相同的文件,则会得到以下矩阵:"0.8593 0.4699 -0.4798 0.8774 181.8616 266.0405"文本的高度为73.5.(我注意到,如果我将旋转文本的d参数除以文本的高度(73.5),则得到svg矩阵的d参数(0.8774),但是agian,我怎么知道文本的高度?)

谢谢.

解决方案

正如评论中已经提到的,实际上,您实际上具有大量矩阵和标量,至少可以处理当前的转换矩阵,文本矩阵,字体大小,水平缩放比例和页面用户单位设置.当然,您可以将所有这些组合到一个矩阵中.

因此,我们假设您拥有的矩阵就是这个组合矩阵.

要确定字体从其大小1默认状态延伸的因素,您可以简单地将该矩阵应用于长度为1的垂直和水平线段,例如[0,0,1]到[1,0,1]和[0,0,1]到[0,!,1],然后计算所得线段的长度.

PS 做一些次要的线性代数,您将看到矩阵的这种情况

  a b 00设1 

这相当于 sqrt(a²+b²) 水平字体范围和垂直字体的范围( height )的 sqrt(c²+d²) .

I am making a pdf parser and I have a problem when I am trying to read the transformation matrix (Tm) of a text.

For example, when I have a horizontal text, the transformation matrix looks like this:
"71.9871 0 0 73.5 178.668 522.2227 Tm" which means that the text's height is the d parameter (73.5), the ratio of each character is a/d (71.9871/73.5) and it has to be translated to the point (178.668 522.2227).

If I rotate this text, then the transformation matrix looks like this:
"63.1614 -34.5367 35.2625 64.4888 181.8616 575.8494 Tm"

How can I get the height of the text, which is 73.5?

If I export the same file as an svg file I get this matrix: "0.8593 0.4699 -0.4798 0.8774 181.8616 266.0405" and that the height of the text is 73.5. (I have noticed that if i divide the d parameter of my rotated text with the text's height (73.5) I get the d parameter of the svg matrix (0.8774), but agian, how can I know the text's height?).

Thank you.

解决方案

As already mentioned in a comment, you actually have a multitude of matrices and scalars to deal with, at least the current transformation matrix, the text matrix, the font size, the horizontal scaling, and the page user unit setting. Of course, though, you can combine all these into one matrix.

Thus, let's assume the matrix you have is this combined one.

To determine the factors by which the font is stretched from its size 1 default state, you could simply apply that matrix to a vertical and a horizontal line segment of length 1, e.g. [0, 0, 1] to [1, 0, 1] and [0, 0, 1] to [0, !, 1], and then calculate the lengths of the resulting line segments.

PS Doing some minor linear algebra, you will see that for a matrix

a b 0
c d 0
e f 1

this amounts to a horizontal font extent of sqrt(a² + b²) and a vertical font extent (the height) of sqrt(c² + d²).

这篇关于如何从pdf的转换矩阵中获取文本的高度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆