如何通过Glyph AdvanceWidths找到空白(在指数中) [英] How to find whitespace via Glyph AdvanceWidths (in Indices)

查看:67
本文介绍了如何通过Glyph AdvanceWidths找到空白(在指数中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好。我想从XPS文档中提取一些文本字符串(使用Java)。它并不多,而且我并不是在寻找一款完整的XPS软件包。我根本不需要做任何渲染,只需获取Unicode文本。另外,我已经阅读了XPS文件格式并了解其结构。

我接收来自创建XPS文件的特定应用程序的输出(不在我的控制之下),并且不在UnicodeString中放置空格。这是允许的,Glyph Indices为每个UnicodeString提供AdvanceWidth设置。

但是,我还没有找到关于如何确定AdvanceWidth是否足以构成空白的文档。

例如,字符串:

"1这是一串串在一起的单词。我需要将它们分开以提取它们......"

将表示(在我需要提取文本的文件中)这样的内容:

< FixedPage。 。 。
< Glyphs Fill ="#ff000000" FontUri =" / Documents / 1 / Resources /Fonts /
C33C1892-4299-487A-9A63-97230919AAA4.odttf"
FontRenderingEmSize = QUOT; 10.5596" StyleSimulations = QUOT;无" OriginX = QUOT; 38.08"
OriginY = QUOT; 229.12"
指数= QUOT; 3,27; 25331; 39; 40; 36; 39,73; 3,27; 50,77; 53,73; 3,27; 36; 47; 44; 57; 40, 855; 49; 36250; 47447; 20; 21,57; 20; 3; 11; 21; 12164; 23; 21; 24,57; 3,27; 11; 26; 12361; 25; 19,57; 3 ,27; 11; 23; 12200; 21; 3; 11,34; 23; 12,1080; 22,57; 28; 3,27; 11,34; 28; 12380; 22; 22319; 19; 3 ; 11; 20; 12258; 41; 3,27; 22; 16,34; 23377; 16341; 20; 24,87; 11; 26; 12214; 23; 18,27; 20382; 28; 18; 20"的UnicodeString = QUOT; 1Hereisastringofwordsstrungtogether.Ineedtoseparatethemtoextractthem ...." /
。 。 。
< / FixedPage>


在上面,显示的指数不带字符串,但我只想给你一般的想法格式。

我知道Indices中的第二个字段是AdvanceWidths,但我还没有找到一种简单(或任何)方法来确定在输出字符串中放置空格的位置。

任何人都可以对此有所了解,还是指出了一个很好的信息来源?

谢谢,Alan

解决方案

由于AdvanceWidth是相对于字体大小的,我认为200或以上表示空格,作为一个粗略的经验法则。

这有意义吗?艾伦

Hi.  I want to extract some text strings (using Java) from an XPS document.  It is not much, and I am not looking for a full-blown do-everything-with XPS software package.  I do not need to do any rendering at all, just get Unicode text.  Also, I have read up on the XPS file format and understand its structure.

I receive output from a particular application that creates the XPS files (not under my control) and does not put whitespace
in the UnicodeString.  This is allowable, and the Glyph Indices provide AdvanceWidth settings for each UnicodeString to do that.

However, I have not found documentation on how to determine whether or not an AdvanceWidth is large enough to constitute whitespace.

For example, the string:

"1 Here is a string of words strung together. I need to separate them
to extract them...."

would be represented (in the file from which I need to extract text) something like this:

<FixedPage . . .
<Glyphs Fill="#ff000000" FontUri="/Documents/1/Resources/Fonts/
C33C1892-4299-487A-9A63-97230919AAA4.odttf"
FontRenderingEmSize="10.5596" StyleSimulations="None" OriginX="38.08"
OriginY="229.12"
Indices="3,27;25,331;39;40;36;39,73;3,27;50,77;53,73;3,27;36;47;44;57;40,85­5;49;36,250;47,447;20;21,57;20;3;11;21;12,164;23;21;24,57;3,27;11;26;12,361­;25;19,57;3,27;11;23;12,200;21;3;11,34;23;12,1080;22,57;28;3,27;11,34;28;12­,380;22;22,319;19;3;11;20;12,258;41;3,27;22;16,34;23,377;16,341;20;24,87;11­;26;12,214;23;18,27;20,382;28;18;20"UnicodeString="1Hereisastringofwordsstr­ungtogether.Ineedtoseparatethemtoextractthem...." / 
 . . .
</FixedPage>

In the above, the Indices shown do not go with the string, but I just
wanted to give you the general idea of the format.

I know the second fields in Indices are AdvanceWidths, but I have not found an easy (or any) way to determine where to put spaces in the output string.

Can anyone shed some light on this or point me to a good source of information on this? 

Thanks, Alan

解决方案

Since AdvanceWidth is relative to font size, I think 200 or above would indicate white space, as a rough rule of thumb.

       Does this make sense?      Alan


这篇关于如何通过Glyph AdvanceWidths找到空白(在指数中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆