Java字符串字符编码 - 法语 - 荷兰语语言 [英] Java Strings Character Encoding - For French - Dutch Locales

查看:178
本文介绍了Java字符串字符编码 - 法语 - 荷兰语语言的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码

public static void main(String[] args) throws UnsupportedEncodingException {
        System.out.println(Charset.defaultCharset().toString());

        String accentedE = "é";

        String utf8 = new String(accentedE.getBytes("utf-8"), Charset.forName("UTF-8"));
        System.out.println(utf8);
        utf8 = new String(accentedE.getBytes(), Charset.forName("UTF-8"));
        System.out.println(utf8);
        utf8 = new String(accentedE.getBytes("utf-8"));
        System.out.println(utf8);
        utf8 = new String(accentedE.getBytes());
        System.out.println(utf8);
}

上面的输出如下

windows-1252
é
?
é
é

有人可以帮我了解这做什么?为什么这个输出?

Can someone help me understand what does this do ? Why this output ?

推荐答案

如果你已经有一个 String 没有必要对其进行编码和解码,字符串已经是已经解码了原始字节的人的结果。

If you already have a String, there is no need to encode and decode it right back, the string is already a result from someone having decoded raw bytes.

在字符串文本的情况下,有人是编译器将您的源作为原始字节读取,并在您指定的编码中解码它。如果你已经在Windows-1252编码中实际保存了源文件,编译器将其解码为Windows-1252,一切都很好。如果没有,您需要通过声明编译器在编译源代码时使用的正确编码来解决此问题...

In the case of a string literal, the someone is the compiler reading your source as raw bytes and decoding it in the encoding you have specified to it. If you have physically saved your source file in Windows-1252 encoding, and the compiler decodes it as Windows-1252, all is well. If not, you need to fix this by declaring the correct encoding for the compiler to use when compiling your source...

String utf8 = new String(accentedE.getBytes("utf-8"), Charset.forName("UTF-8"));

绝对没有。 (编码为UTF-8,解码为UTF-8 ==无操作)

Does absolutely nothing. (Encode as UTF-8, Decode as UTF-8 == no-op)

utf8 = new String(accentedE.getBytes(), Charset.forName("UTF-8"));

将字符串编码为Windows-1252,然后将其解码为UTF-8。结果只能在Windows-1252中解码(因为是在Windows-1252,duh中编码的),否则会得到奇怪的结果。

Encodes string as Windows-1252, and then decodes it as UTF-8. The result must only be decoded in Windows-1252 (because it is encoded in Windows-1252, duh), otherwise you will get strange results.

utf8 = new String(accentedE.getBytes("utf-8"));

将字符串编码为UTF-8,然后解码为Windows- 1252。

Encodes a string as UTF-8, and then decodes it as Windows-1252. Same principles apply as in previous case.

utf8 = new String(accentedE.getBytes());

绝对没有。 (编码为Windows-1252,解码为Windows-1252 ==无操作)

Does absolutely nothing. (Encode as Windows-1252, Decode as Windows-1252 == no-op)

类似于可能更容易理解的整数:

Analogy with integers that might be easier to understand:

int a = 555;
//The case of encoding as X and decoding right back as X
a = Integer.parseInt(String.valueOf(a), 10);
//a is still 555

int b = 555;
//The case of encoding as X and decoding right back as Y
b = Integer.parseInt(String.valueOf(b), 15);
//b is now 1205 I.E. strange result

这两个都是无用的,因为我们已经有了我们所需要的任何代码, 555

Both of these are useless because we already have what we needed before doing any of the code, the integer 555.

需要
将字符串编码为原始字节, em>离开系统
,并且需要在原始字节进入系统时将其解码为字符串。没有必要在系统内 编码和解码。

There is a need for encoding your string into raw bytes when it leaves your system and there is a need for decoding raw bytes into a string when they come into your system. There is no need to encode and decode right back within the system.

这篇关于Java字符串字符编码 - 法语 - 荷兰语语言的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆