计算pdf中(Td，TD，Tm，cm，T )内容流的确切位置? [英] Calculating the exact positions of(Td, TD, Tm, cm, T) content stream in pdf?

查看：516 发布时间：2020/5/25 4:59:43 pdf accessibility pdfbox tagging pdf-manipulation

本文介绍了计算pdf中(Td，TD，Tm，cm，T *)内容流的确切位置?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

获取或计算pdf中(Td，TD，Tm，cm，T *)内容流的确切位置?

Getting or calculating the exact positions of(Td, TD, Tm, cm, T*) content stream in pdf?

作为一个人类，我可以通过比较，字形在pdf中的位置和内容流的位置来计算(无论是替换最后的Td还是添加最后的Td或使用fontsize乘以fontsize)标记在pdf内容流中的位置价值观.但是我无法以编程方式计算字形的完美位置.请查看屏幕简短内容.

As a human I am able to calculate(whether it is replacing last Td or adding to last Td or multiplication with fontsize) the positions of tags in pdf content stream by comparing , where the glyphs are located in pdf and content stream position values. But I am unable to calculate perfect positions of glyph's programatically . Please see the screen short.

在上图中，左侧框是pdf ui字形，右侧框包含相关的内容流.在内容流中，我突出显示了两个Td位置.

In above image left side box is pdf ui glyphs and right side box contains the related content stream. In content stream I highlighted two Td positions.

在第一个圆圈

3.321 -6.475999832 Td

Td位置应添加到最后的Td位置.假设x1，y1.

The Td positions should add to the last Td positions. Assume x1, y1.

Current_x_pos = x1 + 3.321

Current_x_pos = x1+3.321

Curent_y_pos = y1-6.475999832

然后我们可以获得字形"t"的确切位置.

then we can get the exact position of glyph "t".

在第二个突出显示的圆圈中，新的Td位置(231.544 366.377990 Td)将完全替换为

In second highlighted circle the new Td positions (231.544 366.377990 Td) are completely replaced like

Current_x_pos = 231.544

Curent_y_pos = 366.377990

在某些情况下，有时父标签是Tm，公式可能像这样

Along with that some times the parent tag is Tm at that case the formula might be like this

Current_x_pos = x1 +(tdx1 * font_size)

Current_x_pos = x1+(tdx1*font_size)

Curent_y_pos = y1 +(tdy1 * font_size)

Curent_y_pos = y1+(tdy1*font_size)

当我们需要像上面那样乘法时，有时需要加法.以编程方式我怎么知道这一点.要解析确切位置?(增加了用于乘法的新屏幕)

When we need to multiply like above, and some times addition. Programatically how can I know this. To parse exact positions?(new screen short added for multiplication)

有什么帮助吗? 谢谢.

Any help ? Thanks.

推荐答案

当我们需要像上面那样相乘，有时又相加时.以编程方式我怎么知道这一点.要解析确切位置?

When we need to multiply like above, and some times addition. Programatically how can I know this. To parse exact positions?

这非常简单，对于您总是相乘的 Td 操作，请参见规范ISO 32000-1(与ISO 32000-2类似):

It's quite simple, for a Td operation you always multiply, see the specification ISO 32000-1 (similarly in ISO 32000-2):

对于新初始化的(即身份)文本行矩阵T _lm，此矩阵乘法看起来像是用 t _{x代替其底行} t _y 1 .

For a freshly initialized (i.e. identity) text line matrix T_lm this matrix multiplication looks like replacing its bottom row with t_x t_y 1.

对于文本行矩阵T _lm，仅在底行中针对标识进行更改，此矩阵乘法看起来像是对底行的加法，例如 x y 1 变为 x + t _x y + t _y 1 .

For a text line matrix T_lm with only changes in the bottom row against an identity this matrix multiplication looks like an addition to the bottom row, e.g. x y 1 becomes x+t_x y+t_y 1.

对于文本行矩阵T _lm，如您的第二个示例

For a text line matrix T_lm like in your second example

a 0 0
0 a 0
x y 1

此矩阵乘法看起来像是与 a 的乘法，然后是最底行的加法，即 xy 1 变为 x + a·t _x y + a·t _y 1 .如果前面的 Tf 操作的 font size 参数为 1 ，则 a 将有效地作为结果字体大小引起您的假设的是，字体大小是公式的一部分.

this matrix multiplication looks like a multiplication with a followed by an addition to the bottom row, i.e. x y 1 becomes x+a·t_x y+a·t_y 1. If the font size parameter of the preceding Tf operation was 1, then a would effectively be the resultant font size giving rise to your assumption the font size is part of the formula.

通常，对于任意，非退化的文本行矩阵T _lm

In general, for an arbitrary, non-degenerate text line matrix T_lm

a b 0
c d 0
x y 1

此矩阵乘法看起来甚至更复杂， xy 1 变为 x + a·t _x + c·t _y y + b·t _x + d·t _y 1 .

this matrix multiplication looks even more complex, x y 1 becomes x+a·t_x+c·t_y y+b·t_x+d·t_y 1.

因此，关于您的问题

以编程方式我怎么知道这一点.要解析确切位置?

Programatically how can I know this. To parse exact positions?

您的程序应该始终使用矩阵乘法，而忽略其在单独坐标水平上的外观.

your program should simply always use matrix multiplication and ignore what it looks like on the level of the separate coordinates.

使第二个带圆圈的指令看起来像是单纯的替换，是因为先前的文本行矩阵是恒等矩阵.不过，这并不是由于François假定的还原状态操作，而是由于文本对象操作 BT :

What makes the second circled instruction look like a mere replacement, is that the prior text line matrix is the identity matrix. This is not due to the restore-state operation as assumed by François, though, but more simply to the start of text object operation BT:

由于在文本对象的开头重置了文本矩阵和文本行矩阵，并且无法在文本对象中保存或恢复图形状态，因此在这种情况下，不应该负责保存和恢复图形状态操作.

As the text matrix and the text line matrix are reset at the start of a text object and the graphics state cannot be saved or restored in a text object, the save and restore graphics state operations are not to blame in this case.

(屏幕截图来自Adobe共享的ISO 32000-1副本.)

这篇关于计算pdf中(Td，TD，Tm，cm，T *)内容流的确切位置?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算pdf中(Td，TD，Tm，cm，T )内容流的确切位置? [英] Calculating the exact positions of(Td, TD, Tm, cm, T) content stream in pdf?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算pdf中(Td，TD，Tm，cm，T *)内容流的确切位置? [英] Calculating the exact positions of(Td, TD, Tm, cm, T*) content stream in pdf?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

计算pdf中(Td，TD，Tm，cm，T )内容流的确切位置? [英] Calculating the exact positions of(Td, TD, Tm, cm, T) content stream in pdf?

登录关闭