使用ITextRenderer从非拉丁字符的HTML生成PDF不起作用 [英] Generation of PDF from HTML with non-Latin characters using ITextRenderer does not work
问题描述
这是我花费调查没有结果的第二天。至少现在,我可以问一些非常具体的东西。
This is the 2nd day I spend investigating with no results. At least now, I am able to ask something very specific.
我正在尝试用一个PDF文件中的一些非拉丁字符编写一个有效的HTML代码,一个href =http://itextpdf.com/ =nofollow> iText ,更具体地使用 ITextRenderer 。 com / p / flying-saucer /rel =nofollow> Flying Saucer 。
I am trying to write a valid HTML code that contains some non-Latin characters in a PDF file using iText and more specifically using ITextRenderer from Flying Saucer.
我的简短示例/代码通过初始化一个字符串变量doc开始此值:
My short example/code starts by initializing a string variable doc with this value:
String doc = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">"
+ "<body>Some greek characters: Καλημέρα Some greek characters"
+ "</body></html>";
这是我用于调试目的的代码。我将此字符串保存到HTML文件,然后我通过浏览器打开它,只是为了检查HTML内容是否有效,我仍然可以读取希腊字符:
Here is the code that I use for debugging purposes. I save this string to HTML file and then I open it through a browser just to double check that HTML content is valid and I can still read Greek characters:
//write for debugging purposes in an html file
File newTextFile = new File("C:/work/test.html");
FileWriter fw = new FileWriter(newTextFile);
fw.write(doc);
fw.close();
下一步是尝试将此值写入PDF文件。这是我的代码:
Next step is to try to write this value in the PDF file. This is my code:
ITextRenderer renderer = new ITextRenderer();
//add some fonts - if paths are not right, an exception will be thrown
renderer.getFontResolver().addFont("c:/work/fonts/TIMES.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.getFontResolver().addFont("c:/work/fonts/TIMESBD.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.getFontResolver().addFont("c:/work/fonts/TIMESBI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.getFontResolver().addFont("c:/work/fonts/TIMESI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
.newInstance();
documentBuilderFactory.setValidating(false);
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
builder.setEntityResolver(FSEntityResolver.instance());
org.w3c.dom.Document document = builder.parse(new ByteArrayInputStream(
doc.toString().getBytes("UTF-8")));
renderer.setDocument(document, null);
renderer.layout();
renderer.createPDF(os);
我的代码的最终结果是:
The final outcome of my code is:
在HTML文件中,我得到:一些希腊字符:Καλημέρα一些希腊字符(预期)
In HTML file I get: Some greek characters: Καλημέρα Some greek characters (expected)
在PDF文件中我得到:一些希腊字符:一些希腊字符(意外 - 希腊字符被忽略!!)
In PDF file I get: Some greek characters: Some greek characters (unexpected - greek characters are ignored!!)
依赖关系:
-
java版本1.6.0_27
java version "1.6.0_27"
itext-2.0.8.jar
itext-2.0.8.jar
de.huxhorn.lilith.3rdparty.flyingsaucer.core-renderer-8Pre2。 jar
de.huxhorn.lilith.3rdparty.flyingsaucer.core-renderer-8Pre2.jar
我也尝试了更多的字体,但我猜我的问题与使用无关字体错误任何帮助都是值得欢迎的。
I also have been experimented with much more fonts, but I guess that my problem has nothing to do with using wrong fonts. Any help is more than welcome.
Thanx
推荐答案
来自捷克共和国,与我们的国家象征有同样的问题!经过一些搜索,我设法通过此解决方案解决问题。
i am from Czech Republic, and had same problem with our national symbols! After some searching, i managed to solve it with this solution.
具体与(你已经有):
renderer
.getFontResolver()
.addFont(fonts.get(i).getFile().getPath(),
BaseFont.IDENTITY_H,
BaseFont.NOT_EMBEDDED);
然后重要部分在CSS中:
* {
font-family: Verdana;
/* font-family: Times New Roman; - alternative. Without ""! */
}
在我看来,没有那个CSS,你的字体没有被使用。当我从CSS中删除该行时,编码再次被破坏。
It seems to me, without that css, your fonts are not used. When i remove theese lines from CSS, encoding is broken again.
希望这将有所帮助!
这篇关于使用ITextRenderer从非拉丁字符的HTML生成PDF不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!