计算文本的正确宽度 [英] Calculate correct width of a text

查看:389
本文介绍了计算文本的正确宽度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要阅读AutoCAD导出到PDF的计划,并使用PDFBox在其上放置一些带有文本的标记。
一切正常,除了文本宽度的计算,它写在标记旁边。



我浏览了整个PDF规范并阅读详细的部分,处理图形和文字,但无济于事。据我所知,字形坐标空间设置在用户坐标空间的1/1000。因此,宽度需要按比例放大1000,但它仍然是实际宽度的一小部分。



这就是我正在做的定位文本:

  float textWidth = font.getStringWidth(marker.id)* 0.043f; 
contentStream.beginText();
contentStream.setTextScaling(1,1,0,0);
contentStream.moveTextPositionByAmount(
marker.endX + marker.getXTextOffset(textWidth,fontPadding),
marker.endY + marker.getYTextOffset(fontSize,fontPadding));
contentStream.drawString(marker.id);
contentStream.endText();

* 0.043f作为一个文档的近似值,但下一个文档失败。
我是否需要重置除文本矩阵之外的任何其他转换矩阵?



编辑:一个完整​​的想法示例项目在github上有测试和示例pdf:< a href =https://github.com/ascheucher/pdf-stamp-prototype =nofollow noreferrer> https://github.com/ascheucher/pdf-stamp-prototype



感谢您的帮助!

解决方案

不幸的是,问题和评论仅包括(通过运行示例项目)两个源文档和描述的实际结果


注释文本应在顶部和底部标记上居中对齐,在右侧标记的左侧对齐,在左侧标记的右侧对齐。对齐对我来说不起作用,因为font.getSTringWidth(..)只返回它看起来的一小部分。并且两个PDF中的差异似乎都不同。


但不是具体的样本差异需要修复。



但是代码中有几个问题可能导致这样的观察(以及其他问题!)。应该先修复它们;这可能已经解决了OP观察到的问题。



要采取的方框



OP的代码从媒体框中导出几个值:

  PDRectangle pageSize = page.findMediaBox(); 
float pageWidth = pageSize.getWidth();
float pageHeight = pageSize.getHeight();
float lineWidth = Math.max(pageWidth,pageHeight)/ 1000;
float markerRadius = lineWidth * 10;
float fontSize = Math.min(pageWidth,pageHeight)/ 20;
float fontPadding = Math.max(pageWidth,pageHeight)/ 100;

这些似乎被选择为与页面大小相关的光学上令人愉悦。但是,媒体框通常不是最终的显示或打印的页面大小,裁剪框是。因此,它应该是

  PDRectangle pageSize = page.findCropBox(); 

(实际上是修剪框,修剪后完成页面的预期尺寸,甚至可能更合适;修剪框默认为裁剪框。有关详细信息,请参阅此处。)



这与给定的示例文档无关,因为它们不包含显式裁剪框定义,因此裁剪框默认为媒体框。但是,它可能与其他文件有关,例如OP无法包括的那些。



使用哪个PDPageContentStream构造函数



OP的代码添加内容使用此构造函数流到手头的页面:

  PDPageContentStream contentStream = new PDPageContentStream(doc,page,true,true); 

此构造函数追加(首先 true )和压缩(第二个 true )但遗憾的是它继续处于预先存在的内容留下的图形状态。 / p>

手头观察的图形重要性详情:




  • 转型矩阵 - 它可能已被更改为缩放(或旋转,倾斜,移动......)添加的任何新内容

  • 字符间距 - 可能已更改为添加任何新字符更近

  • 字间距 - 它可能已更改为将任何新单词添加到彼此更近或更远

  • 水平缩放 - 它可能已被更改为缩放添加的任何新字符

  • 文本上升 - 它可能已更改为替换任何垂直添加的新字符



因此,应该选择一个也重置gr的构造函数aphics state:

  PDPageContentStream contentStream = new PDPageContentStream(doc,page,true,true,true); 

第三个 true 告诉PDFBox重置图形状态,即用保存状态/恢复状态运算符对包围前一内容。



这与给定的样本文档相关,至少是转换矩阵已更改。



设置和使用 CalRGB 色彩空间



OP的代码将描边和非描边颜色空间设置为校准颜色空间:

  contentStream.setStrokingColorSpace(new PDCalRGB()); 
contentStream.setNonStrokingColorSpace(new PDCalRGB());

不幸的是新的PDCalRGB()创建有效的 CalRGB 颜色空间对象,缺少所需的 WhitePoint 值。因此,在选择校准色空间之前,请正确初始化它。



此后OP的代码使用



contentStream.setStrokingColor(marker.color.r,marker.color.g,marker.color.b);
contentStream.setNonStrokingColor(marker.color.r,marker.color.g,marker.color.b);

这些(int,int,int)不幸的是,重载使用 RG rg 运算符隐式选择 DeviceRGB 颜色空间。要不覆盖当前颜色空间,请使用带有规范化(0..1)值的(float [])重载。



虽然这与观察到的问题无关,但它会导致PDF查看器出现错误消息。



计算绘制字符串的宽度



OP的代码使用

  float textWidth = font计算绘制字符串的宽度。 getStringWidth(marker.id)* 0.043f; 

且OP很惊讶


* 0.043f作为一个文档的近似值,但下一个文档失败。


有两个建立这个神奇数字的因素:




  • 由于OP已经注明了,因此字形坐标空间设置为1/1000的用户坐标空间并且该数字在字形空间中,因此为0.001。


  • 由于OP忽略了他想要使用他选择的字体大小来获取字符串的宽度。但是字体对象不知道当前的字体大小,并返回字体大小为1的宽度。由于OP选择动态字体大小为 Math.min(pageWidth,pageHeight)/ 20 ,这个因素各不相同。如果两个给定的样本文档大约有42个,但在其他文档中可能完全不同。




定位文本



OP的代码从身份文本矩阵开始这样定位文本:

  contentStream.moveTextPositionByAmount(
marker.endX + marker.getXTextOffset(textWidth,fontPadding),
marker.endY + marker.getYTextOffset(fontSize,fontPadding));

使用方法 getXTextOffset getYTextOffset

  public float getXTextOffset(float textWidth,float fontPadding){
if(getLocation()== Location.TOP)
return(textWidth / 2 + fontPadding)* -1;
else if(getLocation()== Location.BOTTOM)
return(textWidth / 2 + fontPadding)* -1;
else if(getLocation()== Location.RIGHT)
return 0 + fontPadding;
else
return(textWidth + fontPadding)* -1;
}

public float getYTextOffset(float fontSize,float fontPadding){
if(getLocation()== Location.TOP)
return 0 + fontPadding;
else if(getLocation()== Location.BOTTOM)
return(fontSize + fontPadding)* -1f;
else
返回fontSize / 2 * -1;
}

如果 getXTextOffset 我怀疑为添加 fontPadding > Location.TOP Location.BOTTOM 有道理,特别是考虑到OP的愿望

 注释文本应该在顶部和底部居中对齐标记

对于要居中的文本,不应偏离中心。



getYTextOffset 的情况比较困难。 OP的代码基于两个误解:它假定




  • moveTextPositionByAmount 是左下角,

  • 字体大小是字符高度。



实际上文本位置位于基线上,下一个绘制的字形的字形原点将位于那里,例如





因此,y定位要么必须纠正要么考虑下降(以整个字形高度为中心),要么只使用上升(以上面的基线字形高度为中心)。



并且字体大小不表示实际的字符高度,但是的排列方式使得紧密间隔的文本行的标称高度为1个单位,用于字体大小1.紧密间隔意味着一些小的附加的金额r线空间包含在字体大小中。



本质上垂直居中必须决定什么居中,整个高度或高于基线高度,首字母only,整个标签或所有字体字形。 PDFBox不能为所有情况提供必要的信息,但 PDFont.getFontBoundingBox()等方法应该有所帮助。


I need to read a plan exported by AutoCAD to PDF and place some markers with text on it with PDFBox. Everything works fine, except the calculation of the width of the text, which is written next to the markers.

I skimmed through the whole PDF specification and read in detail the parts, which deal with the graphic and the text, but to no avail. As far as I understand, the glyph coordinate space is set up in a 1/1000 of the user coordinate space. Hence the width need to be scale up by 1000, but it's still a fraction of the real width.

This is what I am doing to position the text:

float textWidth = font.getStringWidth(marker.id) * 0.043f;
contentStream.beginText();
contentStream.setTextScaling(1, 1, 0, 0);
contentStream.moveTextPositionByAmount(
  marker.endX + marker.getXTextOffset(textWidth, fontPadding),
  marker.endY + marker.getYTextOffset(fontSize, fontPadding));
contentStream.drawString(marker.id);
contentStream.endText();

The * 0.043f works as an approximation for one document, but fails for the next. Do I need to reset any other transformation matrix except the text matrix?

EDIT: A full idea example project is on github with tests and example pdfs: https://github.com/ascheucher/pdf-stamp-prototype

Thanks for your help!

解决方案

Unfortunately the question and comments merely include (by running the sample project) the actual result for two source documents and the description

The annotating text should be center aligned on the top and bottom marker, aligned to the left on the right marker and aligned to the right on the left marker. The alignment is not working for me, as the font.getSTringWidth( .. ) returns only a fraction of what it seems to be. And the discrepance seems to be different in both PDFs.

but not a concrete sample discrepancy to repair.

There are several issues in the code, though, which may lead to such observations (and other ones, too!). Fixing them should be done first; this may already resolve the issues observed by the OP.

Which box to take

The code of the OP derives several values from the media box:

PDRectangle pageSize = page.findMediaBox();
float pageWidth = pageSize.getWidth();
float pageHeight = pageSize.getHeight();
float lineWidth = Math.max(pageWidth, pageHeight) / 1000;
float markerRadius = lineWidth * 10;
float fontSize = Math.min(pageWidth, pageHeight) / 20;
float fontPadding = Math.max(pageWidth, pageHeight) / 100;

These seem to be chosen to be optically pleasing in relation to the page size. But the media box is not, in general, the final displayed or printed page size, the crop box is. Thus, it should be

PDRectangle pageSize = page.findCropBox();

(Actually the trim box, the intended dimensions of the finished page after trimming, might even be more apropos; the trim box defaults to the crop box. For details read here.)

This is not relevant for the given sample documents as they do not contain explicit crop box definitions, so the crop box defaults to the media box. It might be relevant for other documents, though, e.g. those the OP could not include.

Which PDPageContentStream constructor to use

The code of the OP adds a content stream to the page at hand using this constructor:

PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);

This constructor appends (first true) and compresses (second true) but unfortunately it continues in the graphics state left behind by the pre-existing content.

Details of the graphics state of importance for the observations at hand:

  • Transformation matrix - it may have been changed to scale (or rotate, skew, move ...) any new content added
  • Character spacing - it may have been changed to put any new characters added nearer to or farther from each other
  • Word spacing - it may have been changed to put any new words added nearer to or farther from each other
  • Horizontal scaling - it may have been changed to scale any new characters added
  • Text rise - it may have been changed to displace any new characters added vertically

Thus, a constructor should be chosen which also resets the graphics state:

PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true, true);

The third true tells PDFBox to reset the graphics state, i.e. to surround the former content with a save-state/restore-state operator pair.

This is relevant for the given sample documents, at least the transformation matrix is changed.

Setting and using the CalRGB color space

The OP's code sets the stroking and non-stroking color spaces to a calibrated color space:

contentStream.setStrokingColorSpace(new PDCalRGB());
contentStream.setNonStrokingColorSpace(new PDCalRGB());

Unfortunately new PDCalRGB() does not create a valid CalRGB color space object, its required WhitePoint value is missing. Thus, before selecting a calibrated color space, initialize it properly.

Thereafter the OP's code sets the colors using

contentStream.setStrokingColor(marker.color.r, marker.color.g, marker.color.b);
contentStream.setNonStrokingColor(marker.color.r, marker.color.g, marker.color.b);

These (int, int, int) overloads unfortunately use the RG and rg operators implicitly selecting the DeviceRGB color space. To not overwrite the current color space, use the (float[]) overloads with normalized (0..1) values instead.

While this is not relevant for the observed issue, it causes error messages by PDF viewers.

Calculating the width of a drawn string

The OP's code calculates the width of a drawn string using

float textWidth = font.getStringWidth(marker.id) * 0.043f;

and the OP is surprised

The * 0.043f works as an approximation for one document, but fails for the next.

There are two factors building this "magic" number:

  • As the OP has remarked the glyph coordinate space is set up in a 1/1000 of the user coordinate space and that number is in glyph space, thus a factor of 0.001.

  • As the OP has ignored he wants the width for the string using the font size he selected. But the font object has no knowledge of the current font size and returns the width for a font size of 1. As the OP selects the font size dynamically as Math.min(pageWidth, pageHeight) / 20, this factor varies. In case of the two given sample documents about 42 but probably totally different in other documents.

Positioning text

The OP's code positions the text like this starting from identity text matrices:

contentStream.moveTextPositionByAmount(
    marker.endX + marker.getXTextOffset(textWidth, fontPadding),
    marker.endY + marker.getYTextOffset(fontSize, fontPadding));

using methods getXTextOffset and getYTextOffset:

public float getXTextOffset(float textWidth, float fontPadding) {
    if (getLocation() == Location.TOP)
        return (textWidth / 2 + fontPadding) * -1;
    else if (getLocation() == Location.BOTTOM)
        return (textWidth / 2 + fontPadding) * -1;
    else if (getLocation() == Location.RIGHT)
        return 0 + fontPadding;
    else
        return (textWidth + fontPadding) * -1;
}

public float getYTextOffset(float fontSize, float fontPadding) {
    if (getLocation() == Location.TOP)
        return 0 + fontPadding;
    else if (getLocation() == Location.BOTTOM)
        return (fontSize + fontPadding) * -1f;
    else
        return fontSize / 2 * -1;
}

In case of getXTextOffset I doubt that adding fontPadding for Location.TOP and Location.BOTTOM makes sense, especially in the light of the OP's desire

The annotating text should be center aligned on the top and bottom marker

For the text to be centered it should not be shifted off-center.

The case of getYTextOffset is more difficult. The OP's code is built upon two misunderstandings: It assumes

  • that the text position selected by moveTextPositionByAmount is the lower left, and
  • that the font size is the character height.

Actually the text position is positioned on the base line, the glyph origin of the next drawn glyph will be positioned there, e.g.

Thus, the y positioned either has to be corrected to take the descent into account (for centering on the whole glyph height) or only use the ascent (for centering on the above-baseline glyph height).

And a font size does not denote the actual character height but is arranged so that the nominal height of tightly spaced lines of text is 1 unit for font size 1. "Tightly spaced" implies that some small amount of additional inter-line space is contained in the font size.

In essence for centering vertically one has to decide what to center on, whole height or above-baseline height, first letter only, whole label, or all font glyphs. PDFBox does not readily supply the necessary information for all cases but methods like PDFont.getFontBoundingBox() should help.

这篇关于计算文本的正确宽度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆