什么是Java的内部代表String?修改UTF-8? UTF-16? [英] What is the Java's internal represention for String? Modified UTF-8? UTF-16?

查看:361
本文介绍了什么是Java的内部代表String?修改UTF-8? UTF-16?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



一个是:


http://www.codeguru.com/cpp/misc/misc/multi-lingualsupport/article.php/c10451


,它说:


Java使用UTF-16作为内部文本表示,并支持UTF-8的非标准修改对于字符串序列化。


另一个是:


http://en.wikipedia.org/wiki/UTF-8# Modified_UTF-8


,它说:


Tcl还使用与Java相同的修改的UTF-8 [25]作为Unicode数据的内部表示形式,bu t对外部数据使用严格的CESU-8。


修改的UTF-8?还是UTF-16?哪一个是正确的? Java在内存中使用多少字节?



请让我知道哪一个是正确的,它使用了多少个字节。

解决方案


Java使用UTF-16作为内部文本表示


Java中String和StringBuilder等的表示是UTF-16



http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp


Java平台中的文本如何表示?



Java编程语言基于Unicode字符集,并且几个库实现了Unicode标准。 Java编程语言中的原始数据类型char是一个无符号的16位整数,可以表示U + 0000到U + FFFF范围内的Unicode代码点,或UTF-16的代码单位。 Java平台中表示字符序列的各种类型和类 - char [],java.lang.CharSequence的实现(如String类)以及java.text.CharacterIterator的实现都是UTF-16序列。 p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ which which which which which which which which which which which which which which which which which which which which which which which which which which which是Java的一些更新的默认值6)实际的内存中的表示可以是8位,ISO-8859-1,但仅适用于不需要UTF-16编码的字符串。



http://www.oracle.com/technetwork/java/ javase / tech / vmoptions-jsp-140102.html


,并支持对字符串进行UTF-8的非标准修改序列化。


序列化字符串默认使用UTF-8。


Java内存中的char使用了多少字节?




一个 char 总是两个字节,如果你忽略对象中填充的需要。



注意:代码点(允许字符> 65535)可以使用一个或两个字符,即2或4个字节。


I searched Java's internal representation for String, but I've got two materials which look reliable but inconsistent.

One is:

http://www.codeguru.com/cpp/misc/misc/multi-lingualsupport/article.php/c10451

and it says:

Java uses UTF-16 for the internal text representation and supports a non-standard modification of UTF-8 for string serialization.

The other is:

http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

and it says:

Tcl also uses the same modified UTF-8[25] as Java for internal representation of Unicode data, but uses strict CESU-8 for external data.

Modified UTF-8? Or UTF-16? Which one is correct? And how many bytes does Java use for a char in memory?

Please let me know which one is correct and how many bytes it uses.

解决方案

Java uses UTF-16 for the internal text representation

The representation for String and StringBuilder etc in Java is UTF-16

http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp

How is text represented in the Java platform?

The Java programming language is based on the Unicode character set, and several libraries implement the Unicode standard. The primitive data type char in the Java programming language is an unsigned 16-bit integer that can represent a Unicode code point in the range U+0000 to U+FFFF, or the code units of UTF-16. The various types and classes in the Java platform that represent character sequences - char[], implementations of java.lang.CharSequence (such as the String class), and implementations of java.text.CharacterIterator - are UTF-16 sequences.

At the JVM level, if you are using -XX:+UseCompressedStrings (which is default for some updates of Java 6) The actual in-memory representation can be 8-bit, ISO-8859-1 but only for strings which do not need UTF-16 encoding.

http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

and supports a non-standard modification of UTF-8 for string serialization.

Serialized Strings use UTF-8 by default.

And how many bytes does Java use for a char in memory?

A char is always two bytes, if you ignore the need for padding in an Object.

Note: a code point (which allows character > 65535) can use one or two characters, i.e. 2 or 4 bytes.

这篇关于什么是Java的内部代表String?修改UTF-8? UTF-16?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆