如何将UTF8转换为Unicode [英] How to convert UTF8 to Unicode
问题描述
我尝试将UTF8字符串转换为Java Unicode字符串。
I try to convert a UTF8 string to a Java Unicode string.
String question = request.getParameter("searchWord");
byte[] bytes = question.getBytes();
question = new String(bytes, "UTF-8");
输入是中文字符,当我比较每个caracter的十六进制代码时,它是相同的Chinses字符。所以我很确定的字符集是UTF8。
The input are Chinese Characters and when I compare the hex code of each caracter it is the same Chinses character. So I'm pretty sure that the charset is UTF8.
我在哪里出错?
推荐答案
作为Java中的UTF-8字符串。一切皆为Unicode。
There's no such thing as a "UTF-8 string" in Java. Everything is in Unicode.
当您调用 String.getBytes()
时不指定编码,默认编码 - 这几乎总是一个坏主意。
When you call String.getBytes()
without specifying an encoding, that uses the platform default encoding - that's almost always a bad idea.
你不应该做任何事情来获得正确的字符在这里 - 请求应该为你处理。如果它不这样做,那么很可能已经丢失了数据。
You shouldn't have to do anything to get the right characters here - the request should be handling it all for you. If it's not doing so, then chances are it's lost data already.
你能举一个例子吗?在您接收的字符串中指定字符的Unicode值(例如,使用 toCharArray()
,然后将每个
Could you give an example of what's actually going wrong? Specify the Unicode values of the characters in the string you're receiving (e.g. by using toCharArray()
and then converting each char
to an int
) and what you expected to receive.
编辑:
To diagnose this, use something like this:
public static void dumpString(String text) {
for (int i = 0; i < text.length(); i++) {
System.out.println(i + ": " + (int) text.charAt(i));
}
}
请注意, / em>每个Unicode字符的值。如果你有一个方便的十六进制库方法,你可能想使用它给你的十六进制值。主要的是,它会转储字符串中的 Unicode 字符。
Note that that will give the decimal value of each Unicode character. If you have a handy hex library method around, you may want to use that to give you the hex value. The main point is that it will dump the Unicode characters in the string.
这篇关于如何将UTF8转换为Unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!