Java String内部表示 [英] Java String internal representation
问题描述
另外,我知道在UTF-16字符串中,每个'character'用一个或两个16位代码单元编码。
但是,当我调试以下java代码
String hello =Hello;
变量hello是一个5字节的数组0x48,0x101,0x108,0x108,0x111
是Hello的ASCII。
如何做?
的一个迷你java进程与这个代码:
class Hi {
public static void main(String args []) {
String hello =Hello;
try {
Thread.sleep(60_000);
} catch(InterruptedException e){
e.printStackTrace();
}
}
}
Ubuntu上的一个 gcore
内存转储。 (usign jps
获取 pid
并将其传递给gcore)
如果在使用Hexeditor的转储中找到这个: 48 65 6C 6C 6F
,则内存中的某个位置是ASCII。
但也是 48 00 65 00 6C 00 6C
这是 UTF-16
表示 String
I understand that the internal representation of Java for String is UTF-16. What is java string representation?
Also, I know that in a UTF-16 String, each 'character' is encoded with one or two 16-bit code units.
However, when I debug the following java code
String hello = "Hello";
the variable hello is an array of 5 bytes 0x48, 0x101, 0x108, 0x108, 0x111 which is ASCII for "Hello".
How can this be?
I took a gcore dump of a mini java process with this code:
class Hi {
public static void main(String args[]) {
String hello = "Hello";
try {
Thread.sleep(60_000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
And did a gcore
memory dump on Ubuntu. (usign jps
to get the pid
and passed that to gcore)
If found this: 48 65 6C 6C 6F
in the dump using a Hexeditor, so it is somewhere in the memory as ASCII.
But also 48 00 65 00 6C 00 6C
which is part of the UTF-16
representation of the String
这篇关于Java String内部表示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!