Java 对 String 的内部表示是什么?修改过的 UTF-8?UTF-16? [英] What is the Java's internal represention for String? Modified UTF-8? UTF-16?

查看:20
本文介绍了Java 对 String 的内部表示是什么?修改过的 UTF-8?UTF-16?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索了 Java 对 String 的内部表示,但有两种材料看起来可靠但不一致.

I searched Java's internal representation for String, but I've got two materials which look reliable but inconsistent.

一个是:

http://www.codeguru.com/cpp/misc/misc/multi-languagesupport/article.php/c10451

它说:

Java 使用 UTF-16 作为内部文本表示,并支持对 UTF-8 进行非标准修改以进行字符串序列化.

Java uses UTF-16 for the internal text representation and supports a non-standard modification of UTF-8 for string serialization.

另一个是:

http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

它说:

Tcl 也使用与 Java 相同的修改后的 UTF-8[25] 来表示 Unicode 数据的内部表示,但对外部数据使用严格的 CESU-8.

Tcl also uses the same modified UTF-8[25] as Java for internal representation of Unicode data, but uses strict CESU-8 for external data.

修改后的 UTF-8?还是UTF-16?哪一个是正确的?Java 在内存中使用多少字节作为字符?

Modified UTF-8? Or UTF-16? Which one is correct? And how many bytes does Java use for a char in memory?

请告诉我哪个是正确的以及它使用了多少字节.

Please let me know which one is correct and how many bytes it uses.

推荐答案

Java 使用 UTF-16 作为内部文本表示

Java uses UTF-16 for the internal text representation

Java 中 String 和 StringBuilder 等的表示是 UTF-16

The representation for String and StringBuilder etc in Java is UTF-16

https://docs.oracle.com/javase/8/docs/technotes/guides/intl/overview.html

文本在 Java 平台中是如何表示的?

How is text represented in the Java platform?

Java 编程语言基于 Unicode 字符集,并且有几个库实现了 Unicode 标准.Java 编程语言中的原始数据类型 char 是一个无符号的 16 位整数,可以表示 U+0000 到 U+FFFF 范围内的 Unicode 代码点,或 UTF-16 的代码单元.Java 平台中表示字符序列的各种类型和类 - char[]、java.lang.CharSequence 的实现(例如 String 类)和 java.text.CharacterIterator 的实现 - 都是 UTF-16 序列.

The Java programming language is based on the Unicode character set, and several libraries implement the Unicode standard. The primitive data type char in the Java programming language is an unsigned 16-bit integer that can represent a Unicode code point in the range U+0000 to U+FFFF, or the code units of UTF-16. The various types and classes in the Java platform that represent character sequences - char[], implementations of java.lang.CharSequence (such as the String class), and implementations of java.text.CharacterIterator - are UTF-16 sequences.

在 JVM 级别,如果您使用 -XX:+UseCompressedStrings(这是 Java 6 的某些更新的默认值)实际内存中的表示可以是 8 位,ISO-8859-1 但仅适用于不需要 UTF-16 编码的字符串.

At the JVM level, if you are using -XX:+UseCompressedStrings (which is default for some updates of Java 6) The actual in-memory representation can be 8-bit, ISO-8859-1 but only for strings which do not need UTF-16 encoding.

http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

并支持对 UTF-8 进行非标准修改以进行字符串序列化.

and supports a non-standard modification of UTF-8 for string serialization.

序列化字符串默认使用 UTF-8.

Serialized Strings use UTF-8 by default.

Java 在内存中使用多少字节作为字符?

And how many bytes does Java use for a char in memory?

char 总是两个字节,如果你忽略在对象中填充的需要.

A char is always two bytes, if you ignore the need for padding in an Object.

注意:代码点(允许字符 > 65535)可以使用一到两个字符,即 2 或 4 个字节.

Note: a code point (which allows character > 65535) can use one or two characters, i.e. 2 or 4 bytes.

这篇关于Java 对 String 的内部表示是什么?修改过的 UTF-8?UTF-16?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆