支持O(1)随机访问和最坏情况O(1)附加的数据结构? [英] A data structure supporting O(1) random access and worst-case O(1) append?

查看:168
本文介绍了支持O(1)随机访问和最坏情况O(1)附加的数据结构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我意识到一个可调整大小的索引集合,它使用一个数组来存储它的元素(如 List 在.NET或 ArrayList 在Java中)在收集结束时分摊O(1)插入时间 。但是,随着一个一个的插入,在集合中 已经达到其容量的关键时刻,下一个插入需要内部数组中所有元素的完整副本新的(大概是两倍的大小)。

I realize a resizable indexed collection that uses an array to store its elements (like List<T> in .NET or ArrayList in Java) has amortized O(1) insertion time at the end of the collection. But then there's always that one pesky insertion at critical junctures where the collection has just reached its capacity and the next insertion requires a full copy of all elements in the internal array to a new one (presumably twice as large).

在我看来,一个常见的错误是使用链表来修复这个问题;但是我相信为每个元素分配一个节点的开销可能相当浪费,实际上,在极少数情况下数组插入成本高昂的情况下实际上会使得保证的O(1)插入的好处更为困难 - 实际上, em>其他数组插入是显着的便宜(而且更快)。

A common mistake (in my opinion) is to go with a linked list to "fix" this problem; but I believe the overhead of allocating a node for every element can be quite wasteful, and in fact would dwarf the benefit of a guaranteed O(1) insertion in that rare case that the array insertion is costly—when, in fact, every other array insertion is significantly cheaper (and faster).

我认为可能有意义的是一个混合方法,包括一个链表列表在每当当前的头阵列达到其容量时,将两倍大的新阵列添加到链表中。那么不需要任何副本,因为链表仍然有原始的数组。基本上,这似乎与 List< T> ArrayList 方法类似(除了我以外)导致复制内部数组的所有元素的成本,这里只需要分配一个新的数组加上一个节点插入的代价。

What I was thinking might make sense is a hybrid approach consisting of a linked list of arrays, where every time the current "head" array reaches its capacity, a new array twice as large is added to the linked list. Then no copy would be necessary since the linked list would still have the original array. Essentially this seems analogous (to me) to the List<T> or ArrayList approach, except that wherever you previously would've incurred the cost of copying all the elements of the internal array, here you only incur the cost of allocating a new array plus a single node insertion.

当然,如果需要,这会使其他功能变得复杂(例如,从集合中间插入/移除);但正如我在标题中所表达的那样,我真的只是寻找一个只添加(和迭代)集合。

Of course, this would complicate other features if they were desired (e.g., inserting/removing into/from the middle of the collection); but as I've expressed in the title, I'm really just looking for an add-only (and iterate) collection.

是那里有什么数据结构非常适合这个目的?或者,你能自己想一个吗?

Are there any data structures out there ideally suited to this purpose? Or, can you think of one yourself?

推荐答案

有一个美丽的结构称为可扩展数组,具有最坏情况O( 1)插入和O(n)内存开销(即,它是渐近地与动态数组相当,但具有O(1)最坏情况插入)。诀窍是采取矢量使用的方法 - 加倍和复制 - 但使复制懒惰。例如,假设你有一个这样的四个元素的数组:

There is a beautiful structure called an extendible array that has worst-case O(1) insertion and O(n) memory overhead (that is, it's asymptotically comparable to a dynamic array, but has O(1) worst-case insertion). The trick is to take the approach that the vector uses - doubling and copying - but to make the copying lazy. For example, suppose you have an array of four elements like this one:

[1] [2] [3] [4]

如果要添加一个新的数字,比如说5,你首先分配一个数组是两次一样大:

If you want to add a new number, say 5, you begin by allocating an array that's twice as large:

[1] [2] [3] [4]
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

接下来,将5插入到新数组中: p>

Next, you insert 5 into the new array:

[1] [2] [3] [4]
[ ] [ ] [ ] [ ] [5] [ ] [ ] [ ]

最后,将4从旧数组下拉到新的: p>

Finally, pull down the 4 from the old array into the new:

[1] [2] [3] [ ]
[ ] [ ] [ ] [4] [5] [ ] [ ] [ ]

从现在起,任何时候你做插入,将元素添加到新数组并从旧数组中下拉一个元素。例如,添加6后,我们将获得

From now on, any time you do an insert, add the element to the new array and pull down one more element from the old array. For example, after adding 6, we'd get

[1] [2] [ ] [ ]
[ ] [ ] [3] [4] [5] [6] [ ] [ ]

插入后另外两个值,我们在这里结束:

After inserting two more values, we end up here:

[ ] [ ] [ ] [ ]
[1] [2] [3] [4] [5] [6] [7] [8]

现在需要添加一个元素,我们丢弃现在的旧数组,并分配一个与当前数组相同的数组(可以容纳16个元素)的数组:

If we now need to add one more element, we discard the now-empty old array and allocate an array twice as large as the current array (capable of holding 16 elements):

[1] [2] [3] [4] [5] [6] [7] [8]
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

这个过程。内存分配的成本(通常在数组的大小上是线性的),你最多可以每插入一次O(1)。

And repeat this process. Discounting the cost of a memory allocation (which is usually sublinear in the size of the array), you do at most O(1) work per insertion.

查找仍然O(1),因为你只是决定要查看哪两个数组,而中间的插入是O(n),因为这是洗牌。

Lookups are still O(1), since you just decide which of the two arrays to look in, while insertions in the middle are O(n) because of the shuffling.

如果你很好奇,我的个人网站上已经此结构的Java实现。我不知道你会发现多么有用,但是欢迎你尝试一下。

If you're curious, I have a Java implementation of this structure on my personal site. I don't know how useful you'll find it, but you're more than welcome to try it out.

希望这有帮助!

编辑 :如果你想花一点时间阅读研究论文,并试图实现一个相当复杂的数据结构,你可以得到相同的结果(最坏情况O(1)追加)在O(√ n)空间开销(这显然是最佳的,顺便说一句)使用本文中的想法我从来没有真正实现这一点,但如果内存是超稀缺资源,这当然是值得的。有趣的是,它将以上构造用作子程序!

EDIT: If you want to invest a bit of time reading over a research paper and trying to implement a fairly complex data structure, you can get the same result (worst-case O(1) append) in O(√n) space overhead (which is provably optimal, by the way) using the ideas in this paper. I never got around to actually implementing this, but it's certainly well-worth the read if memory is a super-scarce resource. Interestingly, it uses this above construction as a subroutine!

这篇关于支持O(1)随机访问和最坏情况O(1)附加的数据结构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆