Javascript:有效地将项目移入和移出固定大小的数组(Javascript: Efficiently move items in and out of a fixed-size array)

7 IT屋

If I have an array that I want to be of fixed size N for the purpose of caching the most recent of N items, then once limit N is reached, I'll have to get rid of the oldest item while adding the newest item.

Note: I don't care if the newest item is at the beginning or end of the array, just as long as the items get removed in the order that they are added.

The obvious ways are either:

  • push() and shift() (so that cache[0] contains the oldest item), or
  • unshift() and pop() (so that cache[0] contains the newest item)

Basic idea:

var cache = [], limit = 10000;

function cacheItem( item ) {

    // In case we want to do anything with the oldest item
    // before it's gone forever.
    var oldest = [];

    cache.push( item );

    // Use WHILE and >= instead of just IF in case the cache
    // was altered by more than one item at some point.
    while ( cache.length >= limit ) {
        oldest.push( cache.shift() );
    }

    return oldest;
}

However, I've read about memory issues with shift and unshift since they alter the beginning of the array and move everything else around, but unfortunately, one of those methods has to be used to do it this way!

Qs:

  1. Are there other ways to do this that would be better performance-wise?
  2. If the two ways I already mentioned are the best, are there specific advantages/disadvantages I need to be aware of?


Conclusion

After doing some more research into data structures (I've never programmed in other languages, so if it's not native to Javascript, I likely haven't heard of it!) and doing a bunch of benchmarking in multiple browsers with both small and large arrays as well as small and large numbers of reads / writes, here's what I found:

  • The 'circular buffer' method proposed by Bergi is hands-down THE best as far performance (for reasons explained in the answer and comments), and hence it has been accepted as the answer. However, it's not as intuitive, and makes it difficult to write your own 'extra' functions (since you always have to take offset into account). If you're going to use this method, I recommend an already-created one like this circular buffer on GitHub.
  • The 'pop/unpush' method is much more intuitive, and performs fairly well, accept at the most extreme numbers.
  • The 'copyWithin' method is, sadly, terrible for performance (tested in multiple browsers), quickly creating unacceptable latency. It also has no IE support. It's such a simple method! I wish it worked better.
  • The 'linked list' method, proposed in the comments by Felix Kling, is actually a really good option. I initially disregarded it because it seemed like a lot of extra stuff I didn't need, but to my surprise....

What I actually needed was a Least Recently Used (LRU) Map (which employs a doubly-linked list). Now, since I didn't specify my additional requirements in my original question, I'm still marking Bergi's answer as the best answer to that specific question. However, since I needed to know if a value already existed in my cache, and if so, mark it as the newest item in the cache, the additional logic I had to add to my circular buffer's add() method (primarily indexOf()) made it not much more efficient than the 'pop/unpush' method. HOWEVER, the performance of the LRUMap in these situations blew both of the other two out of the water!

So to summarize:

  1. Linked List -- most options while still maintaining great performance
  2. Circular Buffer -- best performance for just adding and getting
  3. Pop / Unpush -- most intuitive and simplest
  4. copyWithin -- terrible performance currently, no reason to use
解决方案

If I have an array that caches the most recent of N items, once limit N is reached, I'll have to get rid of the oldest while adding the newest.

You are not looking to copy stuff around within the array, which would take O(n) steps every time.

Instead, this is the perfect use case for a ring buffer. Just keep an offset to the "start" and "end" of the list, then access your buffer with that offset and modulo its length.

var cache = new Array(10000);
cache.offset = 0;

function cacheItem(item) {
    cache[cache.offset++] = item;
    cache.offset %= cache.length;
}
function cacheGet(i) { // backwards, 0 is most recent
    return cache[(cache.offset - 1 - i + cache.length) % cache.length];
}

如果我有一个要固定大小为 N 的数组,以便缓存最新的 N 个项目,则一次限制 N 已达到,我将必须摆脱最旧的项目,同时添加最新的项目。



注意:我不会不必关心最新的项目是在数组的开头还是结尾,只要这些项目按照添加的顺序被删除即可。



显而易见方式是:




  • push()和 shift ()(这样 cache [0] 包含最旧的项目),或

  • unshift()和 pop()(这样 cache [0] 包含最新项)



基本思路:



 < code> var cache = [],限制= 10000; 

函数cacheItem(item){

//如果我们想对最旧的项目
//做任何事情,直到永远消失。
var oldest = [];

cache.push(item);

//使用WHILE和> =而不是仅使用IF,以防某些时候缓存
//被多个项更改。
while(cache.length> = limit){
oldest.push(cache.shift());
}

返回最旧;
}


但是,我读过有关的内存问题shift 和 unshift ,因为它们更改了数组的开头并移动了其他所有内容,但不幸的是,其中的 方法必须采用这种方式!



问:




  1. 还有其他方法可以达到更好的性能吗?

  2. 如果我已经提到的两种方法是最好的,是否有特定的优点/缺点?我需要知道吗?





结论



在对数据结构进行了更多研究之后(我从未使用过其他语言进行编程,因此,如果它不是Javascript的本机,我可能还没有听说过!),然后进行一堆在具有大小阵列的浏览器中的基准测试,无论是大大小小的读取/写入,都是我发现的:





我真正需要的是一个最近不使用(LRU) )地图(使用双向链接列表)。现在,由于我没有在原始问题中指定其他要求,因此我仍将Bergi的答案标记为对该特定问题的最佳答案。但是,由于我需要知道缓存中是否已经存在一个值,并且如果存在,请将其标记为缓存中的最新项,因此必须向循环缓冲区的 add()中添加其他逻辑。 方法(主要是 indexOf())使它的效率不比'pop / unpush'方法高。但是,在这种情况下,LRUMap的性能使其他两个功能都无法自拔!



所以总结一下:




  1. 链接列表-大多数选项,同时仍保持出色的性能

  2. 循环缓冲区-仅添加和获取的最佳性能

  3. Pop / Unpush-最直观,最简单

  4. copyWithin -当前性能糟糕,没有理由使用


解决方案

如果我有一个缓存最新数组的数组N个项目中,一旦达到N个限制,我就必须删除最旧的项目,同时添加最新的项目。




不想复制数组中的内容,每次都会花费 O(n)步。



相反,这是环形缓冲区的理想用例。只需保持列表的"开始"和"结束"的偏移量,然后使用该偏移量并对其长度取模就可以访问缓冲区。



  var cache = new Array(10000); 
cache.offset = 0;

函数cacheItem(item){
cache [cache.offset ++] = item;
cache.offset%= cache.length;
}
function cacheGet(i){//向后,0是最近的
return cache [(cache.offset-1-i + cache.length)%cache.length];
}

本文地址:IT屋 » Javascript:有效地将项目移入和移出固定大小的数组