Java UTF-8奇怪的行为 [英] Java UTF-8 strange behaviour

查看:68
本文介绍了Java UTF-8奇怪的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试用Java解码一些UTF-8字符串. 这些字符串包含一些组合的unicode字符,例如CC 88(组合diaresis). 根据 http://www.fileformat,该字符序列似乎还可以. info/info/unicode/char/0308/index.htm

I am trying to decode some UTF-8 strings in Java. These strings contain some combining unicode characters, such as CC 88 (combining diaresis). The character sequence seems ok, according to http://www.fileformat.info/info/unicode/char/0308/index.htm

但是转换为String后的输出无效. 有什么主意吗?

But the output after conversion to String is invalid. Any idea ?

byte[] utf8 = { 105, -52, -120 };
System.out.print("{{");
for(int i = 0; i < utf8.length; ++i)
{
    int value = utf8[i] & 0xFF;
    System.out.print(Integer.toHexString(value));
}
System.out.println("}}");
System.out.println(">" + new String(utf8, "UTF-8"));

输出:


    {{69cc88}}
    >i?

推荐答案

您要输出到的控制台(例如Windows)可能不支持unicode,并且可能会破坏字符.控制台输出不能很好地表示数据.

The console which you're outputting to (e.g. windows) may not support unicode, and may mangle the characters. The console output is not a good representation of the data.

尝试将输出写入文件,确保在FileWriter上编码正确,然后在对Unicode友好的编辑器中打开文件.

Try writing the output to a file instead, making sure the encoding is correct on the FileWriter, then open the file in a unicode-friendly editor.

或者,使用调试器来确保字符符合您的期望.只是不信任控制台.

Alternatively, use a debugger to make sure the characters are what you expect. Just don't trust the console.

这篇关于Java UTF-8奇怪的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆