了解Javascript / V8中的字符串堆大小 [英] Understanding String heap size in Javascript / V8

查看：98 发布时间：2019/6/6 13:09:09 javascript string v8

本文介绍了了解Javascript / V8中的字符串堆大小的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有人对使用Chrome（V8）在Javascript中确定字符串堆大小有很好的理解/解释？

Does anyone have a good understanding/explanation of how the heap size of strings are determined in Javascript with Chrome(V8)?

我看到的一些例子堆转储：

Some examples of what I see in a heap dump:

1）具有不同 @ 对象ID全部指定为OneByteStrings。 heapdump表示每个副本都有一个浅的&保留大小为32个字节。目前尚不清楚两个字节的字符串如何保留大小为32以及为什么字符串似乎没有被实现。


1) Multiple copies of an identical 2 character strings (ie. "dt") with different @ object Ids all designated as OneByteStrings. The heapdump says each copy has a shallow & retained size of 32 bytes. It isn't clear how a two byte string has a retained size of 32 and why the strings don't appear to be interned.
 2）长对象路径字符串是78个字符长。所有字符都是utf8中的单个字节。它被归类为InternalizedString。它有一个184字节的保留大小。即使使用2字节字符编码仍然不能解释剩余的28个字节。为什么这些路径字符串占用了这么多空间？我可以想象另外4个字节（可能是8个）用于地址，另外4个用于存储字符串长度，但即使使用2个字节的字符编码，仍然会留下16个字节。
2) Long object path string which is 78 characters long. All characters would be a single byte in utf8. It is classified as a InternalizedString. It has a 184 byte retained size. Even with a 2 byte character encoding that would still not account for the remaining 28 bytes. Why are these path strings taking up so much space? I could imagine another 4 bytes (maybe 8) being used for address and another 4 for storing the string length, but that still leaves 16 bytes even with a 2 byte character encoding.
推荐答案
在内部，V8对字符串有许多不同的表示形式：
Internally, V8 has a number of different representations for strings:
 
  SeqOneByteString：最简单，包含一些标题字段，然后是字符串的字节（不是UTF-8编码，只能包含前256个unicode代码点中的字符）
 
  SeqTwoByteString：相同，但使用两个字节每个字符（使用代理项对来表示无法用两个字节表示的unicode字符）。
 
  SlicedString：其他字符串的子字符串。包含指向父字符串的指针以及偏移量和长度。
 
  ConsString：添加两个字符串的结果（如果超过一定大小）。包含指向两个字符串的指针（可能本身就是这些类型的字符串）。
 
  ExternalString：用于从V8外部传入的字符串。
 
 

SeqOneByteString: The simplest, contains a few header fields and then the string's bytes (not UTF-8 encoded, can only contain characters in the first 256 unicode code points)
SeqTwoByteString: Same, but uses two bytes for each character (using surrogate pairs to represent unicode characters that can't be represented in two bytes).
SlicedString: A substring of some other string. Contains a pointer to the "parent" string and an offset and length.
ConsString: The result of adding two strings (if over a certain size). Contains pointers to both strings (which may themselves be any of these types of strings).
ExternalString: Used for strings that have been passed in from outside of V8.

内化只是一个标志，实际的字符串表示可以是上述任何一种。
"Internalized" is just a flag, the actual string representation could be any of the above.
所有这些都有一个共同的父类String，其父级是Name，其父级是HeapObject（它是V8堆上分配的对象的V8类层次结构的根）。
All of these have a common parent class String, whose parent is Name, whose parent is HeapObject (which is the root of the V8 class hierarchy for objects allocated on the V8 heap).
 
  HeapObject有一个字段：指向其Map的指针（这里有一个很好的解释这里）。
 
  Name添加一个额外的字段：一个哈希值。
 
  String添加另一个字段：长度。
 
 

HeapObject has one field: the pointer to its Map (there's a good explanation of these here).
Name adds one additional field: a hash value.
String adds another field: the length.

在32位系统上，每个字节都是4个字节。在64位系统上，每个都是8个字节。
On a 32-bit system, each of these is 4 bytes. On a 64-bit system, each one is 8 bytes.
如果您使用的是64位系统，那么SeqOneByteString的最小大小将为32个字节：上述标题字段的24个字节加上字符串数据的至少一个字节，向上舍入为8的倍数。
If you're on a 64-bit system then the minimum size of a SeqOneByteString will be 32 bytes: 24 bytes for the header fields described above plus at least one byte for the string data, rounded up to a multiple of 8.
关于第二个问题，很难说到底是怎么回事。可能是字符串使用的是2字节表示形式，并且其标题字段的大小超出了您的预期，或者可能是ConsString或SlicedString（其保留的大小将包含它指向的字符串） to）。
Regarding your second question, it's difficult to say exactly what's going on. It could be that the string is using a 2-byte representation and its header fields are pushing up the size above what you are expecting, or it could be that it's a ConsString or a SlicedString (whose retained sizes would include the strings that it points to).
 V8大部分时间都没有内化字符串 - 它内化了在解析过程中找到的字符串常量和标识符名称，以及用作对象属性的字符串密钥，可能还有其他几种情况。
V8 doesn't internalize strings most of the time - it internalizes string constants and identifier names that it finds during parsing, and strings that are used as object property keys, and probably a few other cases.

                        这篇关于了解Javascript / V8中的字符串堆大小的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

了解Javascript / V8中的字符串堆大小 [英] Understanding String heap size in Javascript / V8

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

了解Javascript / V8中的字符串堆大小 [英] Understanding String heap size in Javascript / V8

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭