在可见签名上以Unicode文本编写-pdfbox [英] write in unicode text on visible signature - pdfbox

查看:116
本文介绍了在可见签名上以Unicode文本编写-pdfbox的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要使用PDFBox构建PDF.我也看到签名.我这样写一些文字:

I'we build PDF, using PDFBox. I've visible signature too. I write some text like that:

...
builderSting.append("Tm\n");
builderSting.append(" /F1 " + fontSize + "\n");
builderSting.append("Tf\n");
builderSting.append("(hello world)");
builderSting.append("Tj\n");
builderSting.append("ET");
...
PDStream stream= ...;
stream.createOutputStream().write(builder.toString().getBytes("ISO-8859-1"));

一切正常.但是,如果我在builderString中写一些Unicode字符,则会有"???"而不是文本.

everything works well. but if I write some unicode characters in builderString, there is "???"s instead of text.

样本PDF :在此处链接

问题1),当我看到PDF结构时,出现的是问号而不是文本.是的.而且我不知道如何用unicode字符书写?

QUESTION 1) when I see PDF structure , there is Question-Marks instead of text. Yes. and I dont know how to write with unicode characters?

9 0 obj
<<
/Type /XObject
/Subtype /Form
/BBox [100 50 0 0]
/Matrix [1 0 0 1 0 0]
/Resources <<
/Font 11 0 R
/XObject <<
/img0 12 0 R
>>
/ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
>>
/FormType 1
/Length 13 0 R
>>
stream
q 93.70079 0 0 50 0 0 cm /img0 Do Q
BT
1 0 0 1 93.70079 25 Tm
 /F1 2
Tf
(????)Tj
ET
endstream
endobj

我使用Encoding WinAsciEncoding字体.我可以在pdfbox中使用其他编码吗?

I've font with Encoding WinAsciEncoding. can i use another encoding in pdfbox?

PDFont font = PDTrueTypeFont.loadTTF(template, new File("//fontName.ttf"));
    font.setFontEncoding(new WinAnsiEncoding());

问题2)我已经在PDF中嵌入了字体.但是使用这种字体(可见的矩形)书写文字.为什么?

QUESTION 2) I 've embedded font in PDF. but text is not written with this font (in visible singature Rectangle). Why?

问题3):当我删除字体时,文本仍然存在(当文本为英文时).默认字体是什么? /F1 -什么是第一种字体?

Question 3) when I remove font, text was still there (when the text was in english). what is the default font? /F1 - which is is 1st font?

问题4):如何计算可见签名中的文本宽度?有什么想法吗?

Question 4) How to calculate width of my text in visible signature ? Any ideas?

推荐答案

问题1),当我看到PDF结构时,出现的是问号而不是文本.是的.而且我不知道如何用unicode字符书写?

QUESTION 1) when I see PDF structure , there is Question-Marks instead of text. Yes. and I dont know how to write with unicode characters?

我假设使用 unicode字符是指Unicode中存在的字符,例如拉丁文1. (例如,因为字母"a"也具有Unicode表示,但很可能不会给您带来麻烦.)

I assume that with unicode characters you mean characters present in Unicode but not in e.g. Latin-1. (Because the letter 'a' for example does have a Unicode representation, too, but most likely won't cause you trouble.)

您在StringBuilder结果上调用getBytes("ISO-8859-1").您的 unicode字符很可能不在ISO 8859-1中.因此,String.getBytes返回问号的ASCII代码在其相应位置.

You call getBytes("ISO-8859-1") on your StringBuilder result. Your unicode characters most likely are not in ISO 8859-1. Thus, String.getBytes returns the ASCII code for a question mark in their respective place.

如果问题只不过是如何用Java中的unicode字符写到输出流 ,答案很简单:选择一种包含所有字符的编码,例如程序的所有使用者都支持UTF-8,并调用String.getBytes进行编码.

If the question was merely how to write to an output stream with unicode characters in Java, the answer would be easy: Choose an encoding which contains all you characters, e.g. UTF-8, which all consumers of your program support, and call String.getBytes for that encoding.

但是,当您想将这些信息序列化为PDF表单xobject流时,情况就不同了.在这种情况下,您的整个方法处于从高度可疑到完全错误的路线上:

The case at hand is different, though, as you want to serialize those information as a PDF form xobject stream. In this context your whole approach is somewhere along the route from highly questionable to completely wrong:

在PDF中,每种字体都可能带有其自己的编码,该编码可能类似于常见的编码,例如/WinAnsiEncoding ,或完全自定义.而且,在许多情况下,这些编码被限制为每个字符一个字节,但是在复合字体的情况下,它们也可以是多字节编码.

In PDFs, each font might come along with its own encoding which might be similar to a common encoding, e.g. /WinAnsiEncoding, or completely custom. These encodings, furthermore, in many cases are restricted to one byte per character, but in case of composite fonts they can also be multi-byte-encodings.

作为推论,并非流元素的所有元素都需要使用相同的编码进行编码.例如.操作员名称 Tm Tf Tj 使用其ASCII码进行编码,而要显示的字符串的字符则必须使用相应字体的编码(如果加上尖括号<>,则可以再次进行十六进制编码).

As a corollary, not all elements of the stream elements need to be encoded using the same encoding. E.g. the operator names Tm, Tf, and Tj are encoded using their ASCII codes while the characters of a string to be displayed have to be encoded using the respective font's encoding (and may thereafter be yet again hex-encoded if added in sharp brackets <>).

因此,只有将所有使用的字体都使用相同的编码(对于实际使用的代码点)时,将流作为字符串创建然后将其转换为具有单个编码的字节才有效,此外,还需要使用ASCII码来正确表示运营商.

Thus, creating the stream as a string and then converting them to bytes with a single encoding only works if all used fonts use the same encoding (for the actually used code points) which furthermore needs to be ASCII'ish to correctly represent the operators.

本质上,您应该直接在某个字节缓冲区中构造流,并且对于每个插入的元素使用适当的编码.因此,如果要显示字符,则必须知道当前所选字体使用的编码.

Essentially, you should directly construct the stream in some byte buffer and for each inserted element use the appropriate encoding. In case of characters to be displayed, therefore, you have to be aware of the encoding used by the currently selected font.

如果您想做对,请首先研究PDF规范 ISO 32000-1 ,尤其是有关常规语法的部分以及第9章 Text .

If you want to do it right, first study the PDF specification ISO 32000-1, especially the sections on general syntax and chapter 9 Text.

问题2)我已经在PDF中嵌入了字体.但是使用这种字体(可见的矩形)书写文字.为什么?

QUESTION 2) I've embedded font in PDF. but text is not written with this font (in visible signature Rectangle). Why?

在所讨论的流xobject的资源中,恰好有一种与名称/F0 相关联的嵌入式字体.不过,在流中,您有/F1 2 Tf ,即您选择了大小为2的字体/F1 .

In the resources of the stream xobject in question there is exactly one embedded font associated to the name /F0. In your stream, though, you have /F1 2 Tf, i.e. you select a font /F1 at size 2.

问题3):当我删除字体时,文本仍然存在(当文本为英文时).默认字体是什么?

Question 3) when I remove font, text was still there (when the text was in english). what is the default font?

根据规范 ,第9.3.1节,

According to the specification, section 9.3.1,

font 是当前 Font 子词典中字体资源的名称 资源字典[...] 字体或大小没有初始值

font shall be the name of a font resource in the Font subdictionary of the current resource dictionary [...] There is no initial value for either font or size

不过,出于与旧文档或残破文档兼容的目的,PDF查看器很可能会使用某些默认字体.

Most likely, though, PDF viewers for the sake of compatibility with old or broken documents use some default font.

问题4):如何计算可见签名中的文本宽度?有什么想法吗?

Question 4) How to calculate width of my text in visible signature ? Any ideas?

宽度显然取决于所用字体的度量标准(在这种情况下为字形宽度)和设置的图形状态(字体大小,字符间距,单词间距,当前转换矩阵,文本转换矩阵等).

The widths obviously depends on the metrics of the font used (glyph widths in this case) and the graphics state you set (font size, character spacing, word spacing, current transformation matrix, text transformation matrix, ...).

在您的情况下,您几乎不会在图形状态下执行任何操作,因此,只有从中选择的字体大小才有意义.因此,更有趣的部分是字体指标中的字符宽度.只要您使用标准的14种字体,就可以在此处找到指标.一旦开始使用其他自定义字体,就必须自己从字体定义文件中读取它们.

In your case you hardly do anything in the graphics state and, therefore, only the selected font size from it is of interest. so the more interesting part are the character widths from the font metrics. As long as you use the standard 14 fonts, you find the metrics here. As soon as you start using other, custom fonts, you have to read them from the font definition files yourself.

这篇关于在可见签名上以Unicode文本编写-pdfbox的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆