支持 O(1) 随机访问和最坏情况 O(1) 追加的数据结构? [英] A data structure supporting O(1) random access and worst-case O(1) append?

查看:24
本文介绍了支持 O(1) 随机访问和最坏情况 O(1) 追加的数据结构?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我实现了一个可调整大小的索引集合,它使用数组来存储其元素(如 .NET 中的 List 或 Java 中的 ArrayList)具有 分摊 O(1) 插入时间 在集合的末尾.但是,在集合刚刚达到其容量的关键时刻总是有一个讨厌的插入,而下一次插入需要将内部数组中的所有元素完整复制到一个新的(大概是原来的两倍).

I realize a resizable indexed collection that uses an array to store its elements (like List<T> in .NET or ArrayList in Java) has amortized O(1) insertion time at the end of the collection. But then there's always that one pesky insertion at critical junctures where the collection has just reached its capacity and the next insertion requires a full copy of all elements in the internal array to a new one (presumably twice as large).

一个常见的错误(在我看来)是使用链表来修复"这个问题;但我相信为每个元素分配一个节点的开销可能会非常浪费,而且在这种罕见的情况下,保证 O(1) 插入的好处会相形见绌,因为​​数组插入的成本很高——事实上,当每个 other 数组插入要便宜得多(而且速度更快).

A common mistake (in my opinion) is to go with a linked list to "fix" this problem; but I believe the overhead of allocating a node for every element can be quite wasteful, and in fact would dwarf the benefit of a guaranteed O(1) insertion in that rare case that the array insertion is costly—when, in fact, every other array insertion is significantly cheaper (and faster).

我认为可能有意义的是一种由数组链表组成的混合方法,其中每次当前头"数组达到其容量时,都会将一个两倍大的新数组添加到链表中.然后不需要复制,因为链表仍然具有原始数组.从本质上讲,这似乎(对我而言)类似于 ListArrayList 方法,不同之处在于您以前在任何地方都会产生复制所有元素的成本内部数组,这里你只需要分配一个数组加上单个节点插入的成本.

What I was thinking might make sense is a hybrid approach consisting of a linked list of arrays, where every time the current "head" array reaches its capacity, a new array twice as large is added to the linked list. Then no copy would be necessary since the linked list would still have the original array. Essentially this seems analogous (to me) to the List<T> or ArrayList approach, except that wherever you previously would've incurred the cost of copying all the elements of the internal array, here you only incur the cost of allocating a new array plus a single node insertion.

当然,如果需要,这会使其他功能复杂化(例如,在集合中间插入/删除);但正如我在标题中所表达的,我真的只是在寻找一个仅添加(和迭代)集合.

Of course, this would complicate other features if they were desired (e.g., inserting/removing into/from the middle of the collection); but as I've expressed in the title, I'm really just looking for an add-only (and iterate) collection.

是否有任何数据结构非常适合此目的?或者,你能自己想一个吗?

Are there any data structures out there ideally suited to this purpose? Or, can you think of one yourself?

推荐答案

有一个漂亮的结构叫做可扩展数组,它有最坏情况 O(1) 插入和 O(n) 内存开销(也就是说,它是渐近可比的到动态数组,但有 O(1) 最坏情况插入).诀窍是采用向量使用的方法 - 加倍和复制 - 但要使复制变得懒惰.例如,假设您有一个包含四个元素的数组,如下所示:

There is a beautiful structure called an extendible array that has worst-case O(1) insertion and O(n) memory overhead (that is, it's asymptotically comparable to a dynamic array, but has O(1) worst-case insertion). The trick is to take the approach that the vector uses - doubling and copying - but to make the copying lazy. For example, suppose you have an array of four elements like this one:

[1] [2] [3] [4]

如果你想添加一个新数字,比如 5,你首先要分配一个两倍大的数组:

If you want to add a new number, say 5, you begin by allocating an array that's twice as large:

[1] [2] [3] [4]
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

接下来,将 5 插入到新数组中:

Next, you insert 5 into the new array:

[1] [2] [3] [4]
[ ] [ ] [ ] [ ] [5] [ ] [ ] [ ]

最后,将旧数组中的 4 下拉到新数组中:

Finally, pull down the 4 from the old array into the new:

[1] [2] [3] [ ]
[ ] [ ] [ ] [4] [5] [ ] [ ] [ ]

从现在开始,每次插入时,将元素添加到新数组并从旧数组中再下拉一个元素.例如,添加 6 后,我们会得到

From now on, any time you do an insert, add the element to the new array and pull down one more element from the old array. For example, after adding 6, we'd get

[1] [2] [ ] [ ]
[ ] [ ] [3] [4] [5] [6] [ ] [ ]

再插入两个值后,我们就到这里了:

After inserting two more values, we end up here:

[ ] [ ] [ ] [ ]
[1] [2] [3] [4] [5] [6] [7] [8]

如果我们现在需要再添加一个元素,我们丢弃现在为空的旧数组并分配一个两倍于当前数组的数组(能够容纳 16 个元素):

If we now need to add one more element, we discard the now-empty old array and allocate an array twice as large as the current array (capable of holding 16 elements):

[1] [2] [3] [4] [5] [6] [7] [8]
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

并重复此过程.扣除内存分配的成本(通常是数组大小的次线性),每次插入最多执行 O(1) 工作.

And repeat this process. Discounting the cost of a memory allocation (which is usually sublinear in the size of the array), you do at most O(1) work per insertion.

查找仍然是 O(1),因为您只是决定要查找两个数组中的哪一个,而中间的插入是 O(n),因为这是改组.

Lookups are still O(1), since you just decide which of the two arrays to look in, while insertions in the middle are O(n) because of the shuffling.

如果你好奇,我有这个结构的Java实现在我的个人网站.我不知道您会发现它有多大用处,但非常欢迎您尝试一下.

If you're curious, I have a Java implementation of this structure on my personal site. I don't know how useful you'll find it, but you're more than welcome to try it out.

如果你想花一点时间阅读一篇研究论文并尝试实现一个相当复杂的数据结构,你可以在 O(√n) 空间中得到相同的结果(最坏情况 O(1) 追加)使用本文中的想法来计算开销(顺便说一下,这可以证明是最佳的). 我从来没有真正实现这一点,但如果内存是一种非常稀缺的资源,它当然值得一读.有趣的是,它使用了上述结构作为子程序!

If you want to invest a bit of time reading over a research paper and trying to implement a fairly complex data structure, you can get the same result (worst-case O(1) append) in O(√n) space overhead (which is provably optimal, by the way) using the ideas in this paper. I never got around to actually implementing this, but it's certainly well-worth the read if memory is a super-scarce resource. Interestingly, it uses this above construction as a subroutine!

这篇关于支持 O(1) 随机访问和最坏情况 O(1) 追加的数据结构?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆