Java 字符串编码 (UTF-8) [英] Java String encoding (UTF-8)

查看:196
本文介绍了Java 字符串编码 (UTF-8)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了这一行遗留代码,我正在尝试弄清楚:

I have come across this line of legacy code, which I am trying to figure out:

String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));

据我所知,它是编码&使用相同的字符集解码.

As far as I can understand, it is encoding & decoding using the same charSet.

这与以下有何不同?

String newString = oldString;

是否存在两条线输出不同的场景?

ps:只是澄清一下,是的,我知道 Joel Spolsky 关于编码的优秀文章 !

p.s.: Just to clarify, yes I am aware of the excellent article on encoding by Joel Spolsky !

推荐答案

这可能很复杂

String newString = new String(oldString);

这缩短了字符串,因为使用的底层 char[] 更长.

This shortens the String is the underlying char[] used is much longer.

但更具体地说,它将检查每个字符是否可以进行 UTF-8 编码.

However more specifically it will be checking that every character can be UTF-8 encoded.

在字符串中可以有一些无法编码的字符",这些字符会被转换为 ?

There are some "characters" you can have in a String which cannot be encoded and these would be turned into ?

uD800 和 uDFFF 之间的任何字符都不能被编码,将被转为 '?'

Any character between uD800 and uDFFF cannot be encoded and will be turned into '?'

String oldString = "uD800";
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8");
System.out.println(newString.equals(oldString));

印刷品

false

这篇关于Java 字符串编码 (UTF-8)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆