了解Javascript / V8中的字符串堆大小 [英] Understanding String heap size in Javascript / V8

查看:98
本文介绍了了解Javascript / V8中的字符串堆大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有人对使用Chrome(V8)在Javascript中确定字符串堆大小有很好的理解/解释?

Does anyone have a good understanding/explanation of how the heap size of strings are determined in Javascript with Chrome(V8)?

我看到的一些例子堆转储:

Some examples of what I see in a heap dump:

1)具有不同 @ 对象ID全部指定为OneByteStrings。 heapdump表示每个副本都有一个浅的&保留大小为32个字节。目前尚不清楚两个字节的字符串如何保留大小为32以及为什么字符串似乎没有被实现。

1) Multiple copies of an identical 2 character strings (ie. "dt") with different @ object Ids all designated as OneByteStrings. The heapdump says each copy has a shallow & retained size of 32 bytes. It isn't clear how a two byte string has a retained size of 32 and why the strings don't appear to be interned.

2)长对象路径字符串是78个字符长。所有字符都是utf8中的单个字节。它被归类为InternalizedString。它有一个184字节的保留大小。即使使用2字节字符编码仍然不能解释剩余的28个字节。为什么这些路径字符串占用了这么多空间?我可以想象另外4个字节(可能是8个)用于地址,另外4个用于存储字符串长度,但即使使用2个字节的字符编码,仍然会留下16个字节。

2) Long object path string which is 78 characters long. All characters would be a single byte in utf8. It is classified as a InternalizedString. It has a 184 byte retained size. Even with a 2 byte character encoding that would still not account for the remaining 28 bytes. Why are these path strings taking up so much space? I could imagine another 4 bytes (maybe 8) being used for address and another 4 for storing the string length, but that still leaves 16 bytes even with a 2 byte character encoding.

推荐答案

在内部,V8对字符串有许多不同的表示形式:

Internally, V8 has a number of different representations for strings:


  • SeqOneByteString:最简单,包含一些标题字段,然后是字符串的字节(不是UTF-8编码,只能包含前256个unicode代码点中的字符)

  • SeqTwoByteString:相同,但使用两个字节每个字符(使用代理项对来表示无法用两个字节表示的unicode字符)。

  • SlicedString:其他字符串的子字符串。包含指向父字符串的指针以及偏移量和长度。

  • ConsString:添加两个字符串的结果(如果超过一定大小)。包含指向两个字符串的指针(可能本身就是这些类型的字符串)。

  • ExternalString:用于从V8外部传入的字符串。

  • SeqOneByteString: The simplest, contains a few header fields and then the string's bytes (not UTF-8 encoded, can only contain characters in the first 256 unicode code points)
  • SeqTwoByteString: Same, but uses two bytes for each character (using surrogate pairs to represent unicode characters that can't be represented in two bytes).
  • SlicedString: A substring of some other string. Contains a pointer to the "parent" string and an offset and length.
  • ConsString: The result of adding two strings (if over a certain size). Contains pointers to both strings (which may themselves be any of these types of strings).
  • ExternalString: Used for strings that have been passed in from outside of V8.

内化只是一个标志,实际的字符串表示可以是上述任何一种。

"Internalized" is just a flag, the actual string representation could be any of the above.

所有这些都有一个共同的父类String,其父级是Name,其父级是HeapObject(它是V8堆上分配的对象的V8类层次结构的根)。

All of these have a common parent class String, whose parent is Name, whose parent is HeapObject (which is the root of the V8 class hierarchy for objects allocated on the V8 heap).


  • HeapObject有一个字段:指向其Map的指针(这里有一个很好的解释这里)。

  • Name添加一个额外的字段:一个哈希值。

  • String添加另一个字段:长度。

  • HeapObject has one field: the pointer to its Map (there's a good explanation of these here).
  • Name adds one additional field: a hash value.
  • String adds another field: the length.

在32位系统上,每个字节都是4个字节。在64位系统上,每个都是8个字节。

On a 32-bit system, each of these is 4 bytes. On a 64-bit system, each one is 8 bytes.

如果您使用的是64位系统,那么SeqOneByteString的最小大小将为32个字节:上述标题字段的24个字节加上字符串数据的至少一个字节,向上舍入为8的倍数。

If you're on a 64-bit system then the minimum size of a SeqOneByteString will be 32 bytes: 24 bytes for the header fields described above plus at least one byte for the string data, rounded up to a multiple of 8.

关于第二个问题,很难说到底是怎么回事。可能是字符串使用的是2字节表示形式,并且其标题字段的大小超出了您的预期,或者可能是ConsString或SlicedString(其保留的大小将包含它指向的字符串) to)。

Regarding your second question, it's difficult to say exactly what's going on. It could be that the string is using a 2-byte representation and its header fields are pushing up the size above what you are expecting, or it could be that it's a ConsString or a SlicedString (whose retained sizes would include the strings that it points to).

V8大部分时间都没有内化字符串 - 它内化了在解析过程中找到的字符串常量和标识符名称,以及用作对象属性的字符串密钥,可能还有其他几种情况。

V8 doesn't internalize strings most of the time - it internalizes string constants and identifier names that it finds during parsing, and strings that are used as object property keys, and probably a few other cases.

这篇关于了解Javascript / V8中的字符串堆大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆