Java 字符串编码 (UTF-8) [英] Java String encoding (UTF-8)

查看：196 发布时间：2021/12/27 15:43:01 java string encoding

本文介绍了Java 字符串编码 (UTF-8)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我遇到了这一行遗留代码，我正在尝试弄清楚:

I have come across this line of legacy code, which I am trying to figure out:

String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));

据我所知，它是编码&使用相同的字符集解码.

As far as I can understand, it is encoding & decoding using the same charSet.

这与以下有何不同?

String newString = oldString;

是否存在两条线输出不同的场景?

ps:只是澄清一下，是的，我知道 Joel Spolsky 关于编码的优秀文章 !

p.s.: Just to clarify, yes I am aware of the excellent article on encoding by Joel Spolsky !

这可能很复杂

String newString = new String(oldString);

这缩短了字符串，因为使用的底层 char[] 更长.

This shortens the String is the underlying char[] used is much longer.

但更具体地说，它将检查每个字符是否可以进行 UTF-8 编码.

However more specifically it will be checking that every character can be UTF-8 encoded.

在字符串中可以有一些无法编码的字符"，这些字符会被转换为 ?

There are some "characters" you can have in a String which cannot be encoded and these would be turned into ?

uD800 和 uDFFF 之间的任何字符都不能被编码，将被转为 '?'

Any character between uD800 and uDFFF cannot be encoded and will be turned into '?'

String oldString = "uD800";
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8");
System.out.println(newString.equals(oldString));

印刷品

false

这篇关于Java 字符串编码 (UTF-8)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文