JS Arrays如何在内部调整大小? [英] How are the JS Arrays internally resizing?

查看:68
本文介绍了JS Arrays如何在内部调整大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我一直在尝试在JS中实现一个集合类型的类(类似于在C#中找到的List),它具有一些自定义功能。我也希望它有所优化(我已经阅读了一些关于如何正确使用JS Arrays的文章)。
所以我想我自己如果我们没有为数组定义一个初始大小,我们不断向它添加对象,在内部它必须为每个插入分配一个新的大小,这必须很慢。我可以通过自己分配一个新的大小(改变数组长度)来避免这种情况,有点类似于在CSharp中完成的大小,每当达到最大容量时大小加倍(我知道这不是微不足道但它是一个开始)

So I've been trying to implement a collection type of class (similar to List found in C#) in JS that has some custom functionalities. I also wanted it to be somewhat optimized (I've read some articles on how to properly use JS Arrays). So I thought to myself "if we don't define an initial size to an Array and we keep adding objects to it, internally it will have to allocate a new size for each insertion, that must be slow. I can avoid this by allocating a new size myself (changing the array length), somewhat similar to how it is done in CSharp, doubling in size whenever the max capacity is reached (I know it's not this trivial but it's a start)"

我试图实现这个想法并发现它的速度要慢一些(慢10倍):

I tried to implement this idea and found out that it is way slower (~10x slower):

//This simplified approach of my implementation is faster...
var array = [];
var counter = 0;
function addItem(newItem) {
    array[++counter] = newItem;
}

//..than this version that resizes the array when a limit is reached
var array = [];
array.length = INITIAL_SIZE;
/*
 Alternatively
 var array = new Array(INITIAL_SIZE);
*/
var counter = 0;
function addItem(newItem) {
    if( CheckCapacity(counter + 1) ) { //Function that checks if the maximum size is reached and if it is, change the array.length to the new size
        array[++counter] = newItem;
    }
}

在测试之前,我心想,自从当我调用CheckCapacity(计数器+ 1)时,我有一个新的数组大小,在内部它(JS数组)将不必进行与第一个函数相比更多的操作,因为我确保有可用的空间,更多比必要的,即第二个函数上的数组[++ counter] = newItem 行应该比第一个函数中的相同行更快。
我甚至使用了不同的数组,这些数组包含预先计算出的大小的数据;它还是比较慢。

Before testing this, I thought to myself, "since I've a new size for the array when I call CheckCapacity(counter + 1), internally it (JS Array) won't have to make as much operations compared to the first function since I make sure that there is space available, more than necessary", i.e., the array[++counter] = newItem line on the second function should be faster compared to the same one in the first function. I've even used different arrays which contained pre-calculated sizes for the one holding the items; it still was slower.

回到我的问题,JS Array的实现如何分配必要的大小?我是否正确地假设没有太多可以加快这个过程的速度?对我来说,有意义的是,每次添加新项目时,拥有一个动态分配更多内存的对象(JS数组)的缺点就是速度的损失(除非它实现了相当好的算法,但我不知道不知道,因此我的问题。

So back to my question, how is the implementation of JS Array allocating the necessary size? Am I correct to assume that not much can be done to speed this process up? To me it made sense that the of the drawbacks of having an object (the JS Array) that dynamically allocates more memory each time a new item is added, would be the loss of speed (unless it has pretty good algorithms implemented, but I don't know, hence my question).

推荐答案

在Javascript中,数组是一种抽象。它是如何实现的(以及何时执行分配和调整大小)由Javascript引擎决定 - ECMAScript规范没有规定如何完成。所以有基本的没有精确的方法来知道

In Javascript, an Array is an abstraction. How it is implemented (and when allocation and resizing is performed) is left up to the Javascript engine - the ECMAScript specification does not dictate how this is done. So there is basicallly no precise way to know.

在实践中,Javascript引擎非常聪明地分配内存和确保不分配太多。在我看来,它们比C#的 List 复杂得多 - 因为Javascript引擎可以根据情况动态地改变底层数据结构。算法各不相同,但大多数会考虑数组中是否有漏洞:

In practice, Javascript engines are very clever about how the allocate memory and the make sure not to allocate too much. In my opinion, they are far more sophisticated than C#'s List -- because Javascript engines can dynamically change the underlying data structure depending on the situation. The algorithms vary, but most will consider whether there are any "holes" in your array:

var array = [];
array[0] = "foo"          // is a resizable array
array[1] = "bar"          // is a resizable array
array[2] = "baz"          // is a resizable array
array[1000000] = "hello"; // is now a hash table
console.log(array[1000000]) // "hello"

如果正常使用数组并使用从零开始的连续键,则没有漏洞,大多数Javascript引擎将使用可调整大小的数组数据结构表示Javascript数组。现在考虑第四个任务,我创造了一个大约一百万的所谓洞(洞跨越插槽3-999999)。事实证明,Javascript引擎足够聪明,不会为这个巨大的漏洞分配大约100万个内存插槽。它检测到我们有一个洞,它现在将使用Dictionary / hash-table(如数据结构)(它使用二进制搜索树,其中键被散列)来表示Javascript数组,以节省空间。它不会为洞存储空间,只有四个映射:(0,foo)(1,bar)(2,baz)(1000000,hello)

If you use arrays normally and use contiguous keys starting at zero, then there are no "holes" and most Javascript engines will represent the Javascript array by using a resizable array data structure. Now consider the fourth assignment, I've created a so-called "hole" of roughly a size of a million (the hole spans slots 3-999999). It turns out, Javascript engines are clever enough not to allocate ~1 million slots in memory for this massive hole. It detects that we have a hole, it will now, represent the Javascript array using a Dictionary / hash-table like data structure (it uses a binary search tree where the keys are hashed) to save space. It won't store space for the hole, just four mappings: (0, "foo"), (1, "bar"), (2, "baz"), (1000000, "hello").

不幸的是,现在访问数组的引擎速度较慢,因为它现在必须计算哈希值并遍历树。当没有漏洞时,我们使用可调整大小的阵列,我们有更快的访问时间,但是当我们有一个漏洞时,Array的性能会更慢。常见的术语是指一个数组是密集数组,当它是没有任何漏洞时(它使用可调整大小的数组=更好的性能),而数组是稀疏数组,当带有一个或多个漏洞时(它使用散列表=性能较慢)。为了获得最佳性能,请尝试使用密集阵列。

Unfortunately, accessing the Array is now slower for the engine because it will now have to compute a hash and traverse a tree. When there are no holes, we use a resizable array and we have quicker access times, but when we have a hole the Array's performance is slower. The common terminology is to say an Array is a dense array, when it is without any holes (it uses a resizable array = better performance), and an Array is a sparse array, when it with one or more holes (it uses a hash table = slower performance). For best performance in general, try to use dense arrays.

现在结束,让我告诉您以下是一个坏主意:

Now to finish off, let me tell you that the following is a bad idea:

var array = new Array(1000000);
array[0] = "foo";               // is a hash table

上面的数组有一个大小约为1百万的洞(就像这样: [foo,undefined,undefined,... undefined] )因此,它使用哈希表作为底层数据结构。所以实现自己调整大小是一个坏主意 - 它会创建一个漏洞并导致最差的性能而不是更好。你只是混淆了Javascript引擎。
这是你的代码正在做的事情,你的数组总是有一个漏洞,因此使用哈希表作为底层数据结构;与没有任何漏洞的数组(也就是你的代码的第一个版本)相比,性能更慢。

The array above has a hole of size ~1 million (it's like this: ["foo", undefined, undefined, ... undefined]) and so therefore, it is using a hash-table as the underlying data structure. So implementing the resizing yourself is a bad idea - it will create a hole and cause worst performance than better. You're only confusing the Javascript engine. This is what your code was doing, your array always had a hole in it and therefore was using a hash table as the underlying data structure; giving slower performance compared to an array without any holes (aka the first version of your code).


我是否正确假设不多可以做到加快这个过程吗?

Am I correct to assume that not much can be done to speed this process up?

,用户没什么可做的关于预先分配空间的一面。一般来说,为了加速Javascript数组,你想避免创建稀疏数组(避免创建漏洞):

Yes, there is little to be done on the user's side regarding pre-allocation of space. To speed up Javascript arrays in general you want to avoid creating sparse arrays (avoid created holes):


  1. 不要使用<预分配 new Array(size)。而是随着你的成长。引擎将计算出底层可调整大小数组本身的大小

  2. 使用从0开始的连续整数键。不要从大整数开始。不要添加非整数的键(例如,不要使用字符串作为键)。

  3. 尽量不要删除数组中间的键(不要删除索引处的元素) 5来自索引为0-9的数组。

  4. 不要转换为密集和稀疏数组(即不要重复添加和删除空洞)。引擎可以转换为可调整大小的数组与散列表表示的开销。

  1. Don't pre-allocate using new Array(size). Instead "grow as you go". The engine will work out the size of the underlying resizable array itself.
  2. Use contiguous integer keys starting at 0. Don't start from a big integer. Don't add keys that are not integers (e.g. don't use strings as keys).
  3. Try not to delete keys in the middle of arrays (don't delete the element at index 5 from an array with indices 0-9 filled in).
  4. Don't convert to and from dense and sparse arrays (i.e. don't repeatedly add and remove holes). There's an overhead for the engine to convert to and from the resizable array vs hash-table representations.




[JS阵列超过C#列表的缺点是它们]每次添加新项目时动态分配更多内存

The disadvantage of [JS Arrays over C# Lists is that they] dynamically allocate more memory each time a new item is added

不,不一定。当Javascript数组没有漏洞时,C#列表和Javascipt数组基本相同。两者都是可调整大小的数组不同之处在于:

No, not necessarily. C# Lists and Javascipt Arrays are basically the same when the Javascript array has no holes. Both are resizable arrays. The difference is that:


  1. C#列表使用户可以更好地控制可调整大小的数组的行为。在Javascript中,你无法控制它 - 它在引擎内部。

  2. C#列表允许用户预分配内存以获得更好的性能,而在Javascript中,你应该让引擎自动运行如何在底层可调整大小的数组中预分配内存以获得更好的性能。

这篇关于JS Arrays如何在内部调整大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆