PDF中的Unicode [英] Unicode in PDF

查看:159
本文介绍了PDF中的Unicode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序可应要求生成相对简单的PDF文档,但是我遇到了诸如汉字或奇数数学符号之类的Unicode字符的麻烦.要在PDF中编写普通字符串,请将其放在方括号中:

My program generates relatively simple PDF documents on request, but I'm having trouble with unicode characters, like kanji or odd math symbols. To write a normal string in PDF, you place it in brackets:

(something)

还可以使用八进制代码对字符进行转义:

There is also the option to escape a character with octal codes:

(\527)

,但是最多只能包含512个字符.您如何编码或转义更高的字符?我见过对字节流和十六进制编码字符串的引用,但是我看过的所有引用似乎都不愿意告诉我如何实际执行.

but this only goes up to 512 characters. How do you encode or escape higher characters? I've seen references to byte streams and hex-encoded strings, but none of the references I've read seem to be willing to tell me how to actually do it.

编辑:或者,将我指向一个不错的Java PDF库,它将为我完成这项工作.我当前正在使用的是gnujpdf的一个版本(由于原始作者似乎已经弃用了AWOL,我已经修复了其中的几个错误),该版本可让您针对AWT图形界面进行编程,理想情况下,任何替换都可以一样.

Alternatively, point me to a good Java PDF library that will do the job for me. The one I'm currently using is a version of gnujpdf (which I've fixed several bugs in, since the original author appears to have gone AWOL), that allows you to program against an AWT Graphics interface, and ideally any replacement should do the same.

替代方案似乎是HTML-> PDF,或者是基于段落和框的编程模型,感觉很像HTML. iText是后者的一个示例.这将意味着重写我现有的代码,而且我不相信它们会给我同样的布局灵活性.

The alternatives seem to be either HTML -> PDF, or a programmatic model based on paragraphs and boxes that feels very much like HTML. iText is an example of the latter. This would mean rewriting my existing code, and I'm not convinced they'd give me the same flexibility in laying out.

我以前没有意识到,但是iText库具有Graphics2D API,并且似乎可以完美地处理unicode,所以这就是我要使用的.尽管这不是所问问题的答案,但它为我解决了这个问题.

Edit 2: I didn't realise before, but the iText library has a Graphics2D API and seems to handle unicode perfectly, so that's what I'll be using. Though it isn't an answer to the question as asked, it solves the problem for me.

iText对我来说很好.我想这课是,当遇到似乎毫无意义的困难时,请寻找比你更了解这件事的人.

Edit 3: iText is working nicely for me. I guess the lesson is, when faced with something that seems pointlessly difficult, look for somebody who knows more about it than you.

推荐答案

简单的答案是没有简单的答案.如果看一下PDF规范,您会看到整章,其中一章很长,专门讨论文本显示的机制.我为公司实施了所有PDF支持,而处理文本是迄今为止练习中最复杂的部分.您发现的解决方案-使用第三方库为您完成工作-确实是最佳选择,除非您对PDF文件有非常特定的特殊用途要求.

The simple answer is that there's no simple answer. If you take a look at the PDF specification, you'll see an entire chapter — and a long one at that — devoted to the mechanisms of text display. I implemented all of the PDF support for my company, and handling text was by far the most complex part of exercise. The solution you discovered — use a 3rd party library to do the work for you — is really the best choice, unless you have very specific, special-purpose requirements for your PDF files.

这篇关于PDF中的Unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆