java / pdf文本渲染 [英] java / pdf text rendering

查看:421
本文介绍了java / pdf文本渲染的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用自己的pdf生成Java中的lib文件,并且遇到了一些字体/文本渲染问题。在Java中显示的文本(字体,字间距,字符间距,...)不同于PDF中显示的文本。



在下面的示例中,我使用字体Time New Roman是PDF基本字体之一(所以我没有计算和输出所有的字体度量到PDF)。

所以具体地在我生成的PDF中,我有这样的:

$ $ p $ code
$ F $ 16 $ $ $ $ $ 849 921 Td
(正常收益分配)Tj
ET

字体F5由(仅basefont,所以没有指定文本度量):

  29 0 obj <<< ; / Type / Font / Subtype / Type1 / BaseFont / Times-Roman>> 
endobj

在Java中,我正在使用:

  g2d.setFont(new Font(TimesRoman,Font.PLAIN,16)); 
g2d.drawString(正常收益分配,849,921);

我已经将文本绘制成了一个与文本边界相匹配的矩形,并且在Java中它都可以(我已经计算了java中的字符串边界),但是在Adobe Acrobat Reader中,文本比矩形大。



这是一个截图通过截取显示我的PDF的Adobe Acrobat Reader的屏幕截图,并显示缓冲图像的程序截图;然后将截图下方的截图下方的部分pdf截图复制/粘贴到MSPaint中。矩形大小,我必须在Adobe中以原始大小的65.5%显示PDF): b


所以我们可以看到在java en adobe中用来显示文本的字体是一样的。但是文字看起来更像Adobe。实际上,如果我将两个单词(一个来自java上的一个来自adobe)叠加在一起,似乎字间距是相同的,即字母间距也是如此,但是一些字母有1个像素宽度的差异。



为什么?
我能做些什么来解决这个问题?我尝试玩字符间距(Tc操作符),字间距(Tw操作符),水平缩放(Tz操作符);我认为它可以工作;但是为什么两个程序中的缩放/间距都不一样?这些(默认)参数不是字体文件(这是一个真正的类型)的一部分?如何正确检索它们(而不是手动将参数放入我的java代码中)?

谢谢

<所以,正如你们已经解释过的,我正在研究不使用pdf基本字体来确保相同的字体(ttf文件)被Java和Adobe Reader使用。但是我仍然有一个问题(相同?)。

在PDF输出中,我生成这样的字体:

  31 0 obj<< 
/ Type / Font
/ FirstChar 0
/ LastChar 255
/ Widths [1298 ... 646]
/ Name / F7
/ Encoding / WinAnsiEncoding
/ Subtype / TrueType / BaseFont / Tahoma / FontDescriptor 32 0 R
>>
endobj

32 0 obj<<
/ Type / FontDescriptor
/ Ascent 1299
/ CapHeight 1298
/ Descent -269
/ Flags 32
/ FontBBox [0 -269 2012 1299]
/ FontName / Tahoma
/ ItalicAngle 0
/ StemV 126
/ XHeight 1298
>>
endobj

如果我正确理解了规范,所有的数字(宽度,上升,下降,...)是相对于字形单位(1em基于?),其中1em = 1000(而1em是M字符的宽度)。

从java的所有这些参数,我首先尝试找到正确的Java字体大小,以使M字符的宽度等于1000(因为Java不允许访问Font类或其他类中的这些参数; PDF需要它即使这些信息都进入了ttf文件中)

  float size = 1f; 
while(true){
font = font.deriveFont(size);
fm = g2d.getFontMetrics(font);
int em = fm.charWidth('M');
if(em> = 1000)
break;
size + = 1;
}

然后我可以生成所有的requiered参数。例如,对于Widths数组(这是每个字符的宽度):

  String pdfWidths =; 
for(int i = 0; i <= 255; ++ i){
int width = fm.charWidth(i);
pdfWidths + = width +;
}

但是这样做,我仍然有我的文本重叠在Adobe Viewer中的矩形。
所以我必须把我的EM限制(进入我的蛮力循环)设置为Tahoma字体的780;到Verdana字体为850; ...显示类似的文本(不完全相同,但这也许是由于抗锯齿算法?)(参见下面的屏幕截图)。所以这不是一个常数限制(必须在理论上等于1000),而是一个可变限制...是正确的吗? (我想不是)如果是,如何找到这个限制?如果没有,什么是错的?





再次感谢

编辑



简单地设置字体大小为1000,并且不用bruteforcing找到EM / Line高度的大小,结果是pdf真的是java。

  font = font.deriveFont(1000f); 
fm = g2d.getFontMetrics(font);
//检索宽度属性
_pdfWidths =;
for(int i = _firstChar; i <= _lastChar; ++ i){
int width = fm.charWidth(i);
_pdfWidths + = width +;
}

但是还是有一点区别,也许是由于文字画algorigthm(可能与java和adobe reader不同)?看到下面的图片,我们可以看到,与Verdana,文本是一个小一点的(在宽度)在PDF比在Java。




解决方案

<

第一次尝试使用字体Time New Roman (实际上是 Times-Roman 这是PDF基本字体之一(不是计算并输出所有字体度量到PDF) Java AWT的TimesRoman导致了

TimesRoman c $ c>在16pt平原以自己的方式应用字体指标;您的PDF阅读器会按照PDF规范中指定的16个用户空间单元应用字体指标,使用它认为 Times-Roman 的内容。所有你可以预料的是有一些相似之处(否则其中一个上下文会做出一个非常糟糕的选择),但根本不是身份。

大卫在他的回答中实际上解释了在第1项(不同字体)和第3项(字距和替换的不同应用)中的更多细节。另外,




BTW:从PDF 1.5开始,对标准14种字体的特殊处理已经被弃用了。 ( ISO 32000-1中的第9.6.2.1节)。因此,通过不在PDF中明确包含字体指标,您可以做一些已经被许多年废弃的东西。


下一次尝试其中涉及到不使用pdf基本字体来确保Java和Adobe Reader使用相同的字体(ttf文件),需要计算要嵌入到PDF中的字符宽度。在这种情况下,假设所有数字(宽度,上升,下降,...)都是相对于字形单位(1em基于?),其中1em = 1000(而1em是M字符的宽度) 。 因此试图找到正确的java字体大小,使M字符的宽度等于1000,然后生成所有被请求的参数字体。
$ b


不,不是基于em的,而是:字体定义了一个标准大小的字形。这个标准是这样安排的,使紧密间隔的文本行的标称高度是1个单位。因此,1000个字形空间单位是该标称行的高度。

这就引出了一个问题:名义线究竟是什么。幸运的是,反过来这样做更容易:根据定义,大小为1的字体是nominal line的高度为1的字体。因此,


不应该是宽度

数组可以填充 1000 * fm.charWidth(i)其中 fm 是字体的度量大小1?或者,因为AWT以int宽度工作,所以 fm.charWidth(i)其中 fm 是字体的度量在1000尺寸?

考虑到这一点,简单地设置字体大小为1000,而不用强制发现EM /线身高的大小,在pdf的结果是真的要java。但是还是有一点区别,也许是由于文本绘图算法(字母也可能不同于java和adobe reader)。在下面的图片中,我们可以看到,在Verdana中,文本比pdf中的文本稍小(宽度)。 bb


查看 FontMetrics.charWidth 方法注释:注意,字符串的前进不一定是字符前进的总和。 AWT另外适用字距等,导致轻微的偏差。在PDF中,虽然使用单个Tj操作,但是这些提高会加起来。

如果要在PDF中使用字距调整,你必须明确地写出标准宽度的偏差。在这里, TJ 操作符非常方便,可以将字符串和偏移量作为参数混合使用。



如果要用某些字符替换,连字符,你也必须自己做这个

I'm working on my own pdf generating lib in Java and I'm having some troubles with font/text rendering. The text displayed (font, word spacing, character spacing,...) in Java differs from the one displayed in PDF.

In my below example, I'm using the font "Time New Roman" which is one of the PDF base font (so I have not to compute and output all the font metrics into the pdf).

So concretely in my generated PDF, I have this:

BT
/F5 16 Tf
849 921 Td
(Normal Return Distribution) Tj
ET

And the font F5 is defined by the object 29 0 R, which is (only basefont, so no text metrics is specified):

29 0 obj <</Type /Font /Subtype /Type1 /BaseFont /Times-Roman>>
endobj

In Java, I'm using :

g2d.setFont(new Font("TimesRoman", Font.PLAIN, 16));
g2d.drawString("Normal Return Distribution", 849, 921);

I've drawn the text into a rectangle which match the text boundaries, and in Java all it's ok (I've compute the string bounds in java), but in adobe acrobat reader, the text is bigger than the rectangle.

Here is a screenshot (I've built it by taking a screenshot of Adobe Acrobat Reader displaying my PDF, and taking a screenshot of my program displaying the buffered image; and then copy/paste the portion of the pdf screenshot below the rectangle of my program screenshot into MSPaint. To have the same rectangles size, I have to display the pdf in Adobe in 65.5% of the original size):

So we can see that the font used in java en adobe to display the text is the same. But the text seems a little bigger into Adobe. In fact if I superimpose two words (one from java on top of one from adobe) it seems that the word spacing is the same, the letter spacing too, but some letters have 1 pixel width diff.

Why? What can I do to sovle this? I'v tried to play (in pdf) with character spacing (Tc operator), word spacing (Tw operator), horizontal scaling (Tz operator); I think it can "work"; but why is not the same scaling/spacing/... in both program? Theses (default) parameters are not part of the Font file (which is a true type one)? And how to retrieve them correctly (without putting into my java code the parameter manually)?

Thanks

EDIT

So, as you've both explained, I'm investigating to not use pdf base fonts to be sure that the same font (ttf file) is used by Java and Adobe Reader. But I'm stil have one problem (the same?).

In PDF output, I'm generating the font like that:

31 0 obj <<
/Type /Font
/FirstChar 0
/LastChar 255
/Widths[1298 ... 646]
/Name /F7
/Encoding /WinAnsiEncoding
/Subtype /TrueType /BaseFont /Tahoma /FontDescriptor 32 0 R
>>
endobj

32 0 obj <<
/Type /FontDescriptor
/Ascent 1299
/CapHeight 1298
/Descent -269
/Flags 32
/FontBBox [0 -269 2012 1299]
/FontName /Tahoma
/ItalicAngle 0
/StemV 126
/XHeight 1298
>>
endobj

If I have understand the specification correctly, all number (widths, ascent, descent,...) are relative to glyph unit (1em based?), where 1em = 1000 (and 1em is the width of the M character).

So to generate all theses parameters from java, I first try to find the correct java font size to have the width of the M character to be equal to 1000 (because Java does not give access to theses parameters in Font class or other classes; and PDF needs it even if theses informations are into the ttf file??).

float size = 1f;
while (true) {
    font = font.deriveFont(size);
    fm = g2d.getFontMetrics(font);
    int em = fm.charWidth('M');
    if (em >= 1000)
        break ;
    size += 1;
}

And then I can generate all requiered parameters. By example, for the Widths array (which is the width of each character) :

String pdfWidths = "";
for (int i = 0; i <= 255; ++i) {
    int width = fm.charWidth(i);
    pdfWidths += width + " ";
}

But doing this, I still have my text overlapping the rectangle in Adobe Viewer. So I have to set my EM limit (into my brute force loop) to 780 for Tahoma font; to 850 for Verdana font;... to have similar text displayed (not exactly the same, but it's due, perhaps, to the anti aliasing algorithm?) (see the screenshot below). So it's not a constant "limit" (to must be theorically equal to 1000), but a variable limit... is that correct? (I think no) If yes, how to find this limit? If no, what is wrong?

Thanks again.

EDIT

Simply setting font size to 1000 and without bruteforcing to found the EM/Line height size, the result in pdf is really to java.

font = font.deriveFont(1000f);
fm = g2d.getFontMetrics(font);
//Retrieve Widths attribute
_pdfWidths = "";
for (int i = _firstChar; i <= _lastChar; ++i) {
    int width = fm.charWidth(i);
    _pdfWidths += width + " ";
}

But there is still a little difference, maybe it is due to the text drawing algorigthm (kerning maybe differ from java and adobe reader?). See image below, we can see, with Verdana, that the text is a little bit smaller (in width) in pdf than in java.

解决方案

This answer essentially is a roundup of my comments.

The first attempt which involved using the font "Time New Roman" (actually Times-Roman) which is one of the PDF base font (not to compute and output all the font metrics into the pdf) for the PDF and "TimesRoman" for Java AWT, resulted in

Essentially: your app uses what the Java AWT considers TimesRoman plain at 16pt applying font metrics in its own manner; your PDF viewer uses what it considers Times-Roman at 16 user space units applying font metrics as specified in the PDF spec. All you can expect is some similarity (otherwise one of those contexts would have made a very bad choice) but not at all identity.

David actually explained that in more detail in item 1 (different fonts) and item 3 (different application of kerning and substitutions) in his answer.

Furthermore,

BTW: Beginning with PDF 1.5, the special treatment given to the standard 14 fonts is deprecated. (section 9.6.2.1 in ISO 32000-1). Thus by not including the font metrics explicitly in the PDF, you do something that has been deprecated for many many years.

The next attempt which involved not using pdf base fonts to be sure that the same font (ttf file) is used by Java and Adobe Reader, required calculation of character widths to embed in the PDF. In this context the assumption was made that all number (widths, ascent, descent,...) are relative to glyph unit (1em based?), where 1em = 1000 (and 1em is the width of the M character). Consequentially it was attempted to find the correct java font size to have the width of the M character to be equal to 1000 and then generate all requiered parameters from that font.

no, not em-based, but instead: A font defines the glyphs at one standard size. This standard is arranged so that the nominal height of tightly spaced lines of text is 1 unit. Thus, 1000 glyph space units are the height of that nominal line.

This led to the question what exactly is that "nominal line". Fortunately it is easier to approach this the other way around: A font at size 1 by definition is a font for which that "nominal line" has a height of 1. Thus,

shouldn't the Widths array be be filled with 1000 * fm.charWidth(i) where fm are the metrics of the font at size 1? Or, as AWT works with int widths, with fm.charWidth(i) where fm are the metrics of the font at size 1000?

Taking this into account, simply setting font size to 1000 and without bruteforcing to found the EM/Line height size, the result in pdf is really to java. But there is still a little difference, maybe it is due to the text drawing algorigthm (kerning maybe differ from java and adobe reader?). See image below, we can see, with Verdana, that the text is a little bit smaller (in width) in pdf than in java.

Have a look at the FontMetrics.charWidth method comment: Note that the advance of a String is not necessarily the sum of the advances of its characters. AWT additionally applies kerning etc resulting in slight deviations. In a PDF, though, using a single Tj operation, those advances do add up.

If you want to use kerning in PDFs, you have to explicitly write those deviations from the standard widths. Here the TJ operator is quite handy allowing a mixed array of Strings and offsets as parameter.

If you want to substitute some characters by e.g. ligatures, you also have to do that yourself

这篇关于java / pdf文本渲染的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆