JavaScript 数组如何在内部调整大小? [英] How are the JavaScript Arrays internally resizing?

查看:45
本文介绍了JavaScript 数组如何在内部调整大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试在 JavaScript 中实现一个集合类型的类(类似于 C# 中的 List),它具有一些自定义功能.我还希望对它进行一些优化(我已经阅读了一些关于如何正确使用 JavaScript 数组的文章).

我心想如果我们不为数组定义初始大小,而是不断向其中添加对象,则在内部,它必须为每次插入分配一个新大小,这一定很慢.我可以通过自己分配新的大小(更改数组长度)来避免这种情况,这有点类似于在 C# 中的做法,每当达到最大容量时将大小加倍(我知道这不是微不足道的,但这是一个开始)".

我尝试实现这个想法,发现它的速度慢了很多(大约慢了 10 倍):

//我实现的这种简化方法更快...var 数组 = [];无功计数器 = 0;功能添加项目(新项目){数组[++计数器] = newItem;}//...然后这个版本在达到限制时调整数组大小var 数组 = [];array.length = INITIAL_SIZE;/*或者var array = new Array(INITIAL_SIZE);*/无功计数器 = 0;功能添加项目(新项目){if( CheckCapacity(counter + 1) ) {//检查是否达到最大大小的函数,如果达到,则将 array.length 更改为新大小数组[++计数器] = newItem;}}

在测试这个之前,我对自己想,因为当我调用 CheckCapacity(counter + 1) 时我有一个新的数组大小,所以在内部它(JavaScript Array)将不必进行与比较一样多的操作到第一个函数,因为我确保有可用空间,超过必要的空间",即第二个函数上的 array[++counter] = newItem 行应该比同一个函数更快在第一个函数中.

我什至使用了不同的数组,其中包含为保存项目的数组预先计算的大小;还是比较慢.

回到我的问题,JavaScript Array 的实现如何分配必要的大小?我是否正确地假设不能做太多来加快这个过程?对我来说,每次添加新项目时动态分配更多内存的对象(JavaScript 数组)的缺点是速度损失(除非它实现了非常好的算法,但我不不知道,因此我的问题).

解决方案

在 JavaScript 中,数组是一种抽象.它是如何实现的(以及何时执行分配和调整大小)由 JavaScript 引擎决定——ECMAScript 规范没有规定如何完成.所以基本上没有确切的方法可以知道.

在实践中,JavaScript 引擎在如何分配内存以及确保不要分配太多内存方面非常聪明.在我看来,它们比 C# 的 List 复杂得多——因为 JavaScript 引擎可以根据情况动态更改底层数据结构.算法各不相同,但大多数会考虑是否存在任何漏洞".在您的数组中:

var array = [];数组[0] =foo"//是一个可调整大小的数组数组[1] =条形";//是一个可调整大小的数组数组[2] =baz";//是一个可调整大小的数组数组[1000000] =你好";//现在是一个哈希表console.log(array[1000000])//你好";

如果您正常使用数组并使用从零开始的连续键,那么就不会有漏洞";大多数 JavaScript 引擎将使用可调整大小的数组数据结构来表示 JavaScript 数组.现在考虑第四个任务,我创建了一个所谓的洞".大约一百万的大小(孔跨越插槽 3-999999).事实证明,JavaScript 引擎足够聪明,不会为这个巨大的漏洞分配大约 100 万个内存插槽.它检测到我们有一个洞,它现在将使用类似字典/哈希表的数据结构(它使用二叉搜索树,其中键被哈希)来表示 JavaScript 数组以节省空间.它不会为孔存储空间,只有四个映射:(0, "foo"), (1, "bar"), (2, "baz"), (1000000, "hello").

不幸的是,引擎现在访问数组的速度变慢了,因为它现在必须计算散列并遍历树.当没有空洞时,我们使用可调整大小的数组,访问时间更快,但是当我们有空洞时,阵列的性能会变慢.常见的术语是说一个数组是一个密集数组,当它没有任何孔时(它使用可调整大小的数组=更好的性能),而一个数组是一个<强>稀疏数组,当它有一个或多个孔时(它使用哈希表 = 性能较慢).总体而言,为了获得最佳性能,请尝试使用密集数组.

现在结束,让我告诉你以下是个坏主意:

var array = new Array(1000000);数组[0] =foo";//是哈希表

上面的数组有一个大小约为 100 万的洞(就像这样:[foo", undefined, undefined, ... undefined])因此,它使用了一个哈希表作为底层数据结构.因此,自己实施调整大小是一个坏主意 - 它会造成一个漏洞并导致性能最差而不是更好.你只是混淆了 JavaScript 引擎.

这就是你的代码所做的,你的数组总是有一个洞,因此使用哈希表作为底层数据结构;与没有任何漏洞的数组(也就是代码的第一个版本)相比,性能更慢.

<块引用>

我是否正确地假设不能做很多事情来加快这个过程?

是的,在用户方面,关于空间的预分配几乎没有什么可做的.一般来说,要加速 JavaScript 数组,您需要避免创建稀疏数组(避免创建空洞):

  1. 不要使用 new Array(size) 进行预分配.取而代之的是随你成长".引擎将计算出底层可调整大小的数组本身的大小.
  2. 使用从 0 开始的连续整数键.不要从大整数开始.不要添加不是整数的键(例如不要使用字符串作为键).
  3. 尽量不要删除数组中间的键(不要从填充了索引 0-9 的数组中删除索引 5 处的元素).
  4. 不要在密集和稀疏数组之间进行转换(即不要重复添加和删除孔).引擎在可调整大小的数组与哈希表表示之间相互转换时会产生开销.

<块引用>

[JavaScript 数组优于 C# 列表的缺点是它们]每次添加新项目时动态分配更多内存

不,不一定.当 JavaScript 数组没有空洞时,C# 列表和 JavaScript 数组基本相同.两者都是可调整大小的数组.区别在于:

  1. C# 列表使用户可以更好地控制可调整大小的数组的行为.在 JavaScript 中,您无法控制它——它在引擎内部.
  2. C# 列表允许用户预分配内存以获得更好的性能,而在 JavaScript 中,您应该让引擎自动计算出如何在底层可调整大小的数组中预分配内存以获得更好的性能.

I've been trying to implement a collection type of class (similar to List found in C#) in JavaScript that has some custom functionalities. I also wanted it to be somewhat optimized (I've read some articles on how to properly use JavaScript Arrays).

I thought to myself "if we don't define an initial size to an Array and we keep adding objects to it, internally it will have to allocate a new size for each insertion, that must be slow. I can avoid this by allocating a new size myself (changing the array length), somewhat similar to how it is done in C#, doubling in size whenever the max capacity is reached (I know it's not this trivial but it's a start)".

I tried to implement this idea and found out that it is way slower (about 10 times slower):

// This simplified approach of my implementation is faster...
var array = [];
var counter = 0;
function addItem(newItem) {
    array[++counter] = newItem;
}

// ...then this version that resizes the array when a limit is reached
var array = [];
array.length = INITIAL_SIZE;
/*
 Alternatively
 var array = new Array(INITIAL_SIZE);
*/
var counter = 0;
function addItem(newItem) {
    if( CheckCapacity(counter + 1) ) { // Function that checks if the maximum size is reached and if it is, change the array.length to the new size
        array[++counter] = newItem;
    }
}

Before testing this, I thought to myself, "since I've a new size for the array when I call CheckCapacity(counter + 1), internally it (JavaScript Array) won't have to make as much operations compared to the first function since I make sure that there is space available, more than necessary", i.e., the array[++counter] = newItem line on the second function should be faster compared to the same one in the first function.

I've even used different arrays which contained pre-calculated sizes for the one holding the items; it still was slower.

So back to my question, how is the implementation of a JavaScript Array allocating the necessary size? Am I correct to assume that not much can be done to speed this process up? To me it made sense that the of the drawbacks of having an object (the JavaScript Array) that dynamically allocates more memory each time a new item is added, would be the loss of speed (unless it has pretty good algorithms implemented, but I don't know, hence my question).

解决方案

In JavaScript, an Array is an abstraction. How it is implemented (and when allocation and resizing is performed) is left up to the JavaScript engine - the ECMAScript specification does not dictate how this is done. So there is basically no precise way to know.

In practice, JavaScript engines are very clever about how the allocate memory and the make sure not to allocate too much. In my opinion, they are far more sophisticated than C#'s List -- because JavaScript engines can dynamically change the underlying data structure depending on the situation. The algorithms vary, but most will consider whether there are any "holes" in your array:

var array = [];
array[0] = "foo"          // Is a resizable array
array[1] = "bar"          // Is a resizable array
array[2] = "baz"          // Is a resizable array
array[1000000] = "hello"; // Is now a hash table
console.log(array[1000000]) // "hello"

If you use arrays normally and use contiguous keys starting at zero, then there are no "holes" and most JavaScript engines will represent the JavaScript array by using a resizable array data structure. Now consider the fourth assignment, I've created a so-called "hole" of roughly a size of a million (the hole spans slots 3-999999). It turns out, JavaScript engines are clever enough not to allocate ~1 million slots in memory for this massive hole. It detects that we have a hole, it will now, represent the JavaScript array using a Dictionary / hash-table like data structure (it uses a binary search tree where the keys are hashed) to save space. It won't store space for the hole, just four mappings: (0, "foo"), (1, "bar"), (2, "baz"), (1000000, "hello").

Unfortunately, accessing the Array is now slower for the engine because it will now have to compute a hash and traverse a tree. When there are no holes, we use a resizable array and we have quicker access times, but when we have a hole the Array's performance is slower. The common terminology is to say an Array is a dense array, when it is without any holes (it uses a resizable array = better performance), and an Array is a sparse array, when it with one or more holes (it uses a hash table = slower performance). For best performance in general, try to use dense arrays.

Now to finish off, let me tell you that the following is a bad idea:

var array = new Array(1000000);
array[0] = "foo";               // Is a hash table

The array above has a hole of size ~1 million (it's like this: ["foo", undefined, undefined, ... undefined]) and so therefore, it is using a hash-table as the underlying data structure. So implementing the resizing yourself is a bad idea - it will create a hole and cause worst performance than better. You're only confusing the JavaScript engine.

This is what your code was doing, your array always had a hole in it and therefore was using a hash table as the underlying data structure; giving slower performance compared to an array without any holes (aka the first version of your code).

Am I correct to assume that not much can be done to speed this process up?

Yes, there is little to be done on the user's side regarding pre-allocation of space. To speed up JavaScript arrays in general you want to avoid creating sparse arrays (avoid created holes):

  1. Don't pre-allocate using new Array(size). Instead "grow as you go". The engine will work out the size of the underlying resizable array itself.
  2. Use contiguous integer keys starting at 0. Don't start from a big integer. Don't add keys that are not integers (e.g. don't use strings as keys).
  3. Try not to delete keys in the middle of arrays (don't delete the element at index 5 from an array with indices 0-9 filled in).
  4. Don't convert to and from dense and sparse arrays (i.e. don't repeatedly add and remove holes). There's an overhead for the engine to convert to and from the resizable array vs hash-table representations.

The disadvantage of [JavaScript Arrays over C# Lists is that they] dynamically allocate more memory each time a new item is added

No, not necessarily. C# Lists and JavaScript Arrays are basically the same when the JavaScript array has no holes. Both are resizable arrays. The difference is that:

  1. C# Lists give the user more control over the behaviour of the resizable array. In JavaScript, you have no control over it -- it's inside the engine.
  2. C# Lists allow the user preallocate memory for better performance, whereas in JavaScript, you should let the engine automatically work out how to preallocate memory in the underlying resizable array for better performance.

这篇关于JavaScript 数组如何在内部调整大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆