getBytes() 不适用于西里尔字母 [英] getBytes() doesn't work for Cyrillic letters

查看:44
本文介绍了getBytes() 不适用于西里尔字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找到了一些答案,但没有一个适合我.我想从 html 制作一个 pdf 文件,但问题是我的 html 有西里尔字母,我发现这个简单的代码与此有关:

I found some answers but none of them works for me. I want to make a pdf file from a html, but the problem is that my html has Cyrilic letters and I found that there's something to do with this simple code:

字符串 s = "Здраво Kris";

String s = "Здраво Kris";

byte bytes[] = s.getBytes("UTF-8");

byte bytes[] = s.getBytes("UTF-8");

String value = new String(bytes, "ISO-8859-1");

String value = new String(bytes, "ISO-8859-1");

//我尝试使用 new String(bytes, "UTF-8") 但没有用

// I tried with new String(bytes, "UTF-8") but it didn't work

然后我将值传递给我的 pdf 生成器函数,但它只输出字符串 s 中不是 Cyrilic 的部分,即 Kris

Then I pass the value to my pdf generator function but it outputs only the part from the string s that is not in Cyrilic, i.e. Kris

 htp.CreatePDF("<html><head><title>kristijan</title></head><body><h1>" + value + "</h1></body></html>", "kris");

推荐答案

请看我对这个问题的回答:生成PDF时无法获取捷克语字符

Please take a look at my answer to this question: Can't get Czech characters while generating a PDF

您的代码中可能会出错.

Several things can go wrong in your code.

这是一个非常糟糕的主意:

This is a very bad idea:

String s = "Здраво Kris";

假设您将包含此代码的 .java 文件发送给将其另存为 ASCII 的人,那么您的源代码将更改为:

Suppose that you send your .java file including this code to somebody who saves it as ASCII, then your source code will change into this:

String s = "Здраво Kris";

在将文档存储到源代码控制系统时,我也看到过这种情况.

I've also seen this happen when storing a document into a source control system.

底线:在使用硬编码字符串编写源代码时,切勿使用特殊编码.使用正确的编码将字符串存储在文件中以写入和读取字符串,或者如果您坚持在源代码中使用硬编码数据,则使用 unicode 表示法.

Bottom line: never use special encodings when writing source code with hard-coded strings. Either store the strings in a file using the right encoding to write and read the string, or use the unicode notation if you insist on having hard-coded data in your source code.

即使您正确存储了包含此字符串的文件,在编译代码时也必须非常小心.如果编译器使用不同的编码,s 也会被破坏.

Even if you store the file containing this string correctly, you have to be very careful when compiling the code. If the compiler uses a different encoding, s will be corrupted too.

在将 HTML 转换为 PDF 时,您还必须确保正确读取数据.我假设您使用的是 XML Worker(而不是过时的 HTMLWorker 类).您可以在不同的地方指明要使用的编码.

You also have to make sure that you're reading the data correctly when converting the HTML to PDF. I assume that you are using XML Worker (and not the obsolete HTMLWorker class). There are different places where you can indicate which encoding to use.

最后,您必须确保使用支持 Cyrillic 字符的字体.例如:如果您使用默认字体 Helvetica,则不会呈现任何内容.

Finally, you have to make sure that you use a font that supports Cyrillic characters. For instance: if you use the default font Helvetica, nothing will be rendered.

您还可以在免费电子书中找到此信息StackOverflow 上的最佳 iText 问题.

You can also find this information in the free ebook The Best iText Questions on StackOverflow.

这篇关于getBytes() 不适用于西里尔字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆