类型化数组与字符串的内存开销 [英] Memory overhead of typed arrays vs strings

查看:89
本文介绍了类型化数组与字符串的内存开销的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试减少javascript Web应用程序的内存使用量,该应用程序以大量小字符串的形式将大量信息存储在内存中。当我将代码更改为使用 Uint8Array 而不是 String 时,我注意到内存使用率上升了。

I am trying to reduce the memory usage of a javascript web application that stores a lot of information in memory in the form of a large number of small strings. When I changed the code to use Uint8Array instead of String, I noticed that memory usage went up.

例如,请考虑以下代码创建许多小字符串:

For example, consider the following code that creates many small strings:

// (1000000 strings) x (10 characters)
var a=[];
for (let i=0; i<1000000; i++)
    a.push("a".repeat(10).toUpperCase());

如果你把它放在一个空页面中并让内存使用量稳定几秒钟,它就会解决在Google Chrome上 70 MiB 。另一方面,以下代码:

If you put it in an empty page and let the memory usage settle for a few seconds, it settles at 70 MiB on Google Chrome. On the other hand, the following code:

// (1000000 arrays) x (10 bytes)
var a=[];
for (let i=0; i<1000000; i++)
    a.push(new Uint8Array(10));

使用 233 MiB 的内存。没有任何代码的空页使用大约20 MiB。另一方面,如果我创建少量大字符串/数组,差异会变小,如果是单个字符串/数组,其中10000,000个字符/条目,则内存使用情况几乎相同。

uses 233 MiB of memory. An empty page without any code uses about 20 MiB. On the other hand, if I create a small number of large strings/arrays, the difference becomes smaller and in the case of a single string/array with 10000000 characters/entries, the memory usage is virtually identical.

那么为什么类型化数组会有如此大的内存开销?

So why do typed arrays have such a large memory overhead?

推荐答案

V8开发人员在这里。您的结论是有道理的:如果将字符串中的字符与Uint8Array中的元素进行比较,则字符串将减少开销。 TypedArrays非常适合快速访问类型化元素;但是,拥有大量小型TypedArrays并不具有内存效率。

V8 developer here. Your conclusion makes sense: If you compare characters in a string to elements in a Uint8Array, the string will have less overhead. TypedArrays are great at providing fast access to typed elements; however having a large number of small TypedArrays is not memory efficient.

区别在于字符串和类型化数组的对象标题大小。

The difference is in the object header size for strings and typed arrays.

对于字符串,对象标题是:

For a string, the object header is:


  1. 隐藏类指针

  2. hash

  3. length

  4. payload

  1. hidden class pointer
  2. hash
  3. length
  4. payload

其中有效负载四舍五入到指针大小对齐,所以16在这种情况下的字节数。

where the payload is rounded up to pointer size alignment, so 16 bytes in this case.

对于Uint8Array,您需要以下内容:

For a Uint8Array, you need the following:


  1. 隐藏类指针

  2. 属性指针(未使用)

  3. 元素指针(见下文)

  4. 数组缓冲区指针(见下文)

  5. 偏移到数组缓冲区

  6. 字节长度

  7. 到数组缓冲区的视图长度

  8. 长度(用户可见)

  9. 嵌入字段#1

  10. 嵌入字段#2

  1. hidden class pointer
  2. properties pointer (unused)
  3. elements pointer (see below)
  4. array buffer pointer (see below)
  5. offset into array buffer
  6. byte length
  7. length of view into array buffer
  8. length (user-visible)
  9. embedder field #1
  10. embedder field #2

数组缓冲区:隐藏类指针

array buffer: hidden class pointer

数组缓冲区:嵌入字段#2

array buffer: embedder field #2

元素对象:隐藏类指针

其中,有效载荷再次向上舍入到指针大小对齐,因此这里消耗16个字节。

where, again, the payload is rounded up to pointer size alignment, so consumes 16 bytes here.

总之,每个字符串消耗5 * 8 = 40个字节,每个类型化数组消耗26 * 8 = 208个字节。这似乎是很多开销;原因是由于TypedArrays提供了各种灵活的选项(它们可以是重叠的视图到ArrayBuffers,可以直接从JavaScript分配,或与WebGL和诸如此类共享等)。

In summary, each string consumes 5*8 = 40 bytes, each typed array consumes 26*8 = 208 bytes. That does seem like a lot of overhead; the reason is due to the various flexible options that TypedArrays provide (they can be overlapping views into ArrayBuffers, which can be allocated directly from JavaScript, or shared with WebGL and whatnot, etc).

(这不是优化内存分配,也不是垃圾收集字符串更好 - 因为你持有所有对象,GC不起作用。)

(It's not about "optimizing memory allocation" nor being "better at garbage collecting strings" -- since you're holding on to all the objects, GC does not play a role.)

这篇关于类型化数组与字符串的内存开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆