数据结构支持O(1)随机访问和最坏情况O(1)追加? [英] A data structure supporting O(1) random access and worst-case O(1) append?

查看:157
本文介绍了数据结构支持O(1)随机访问和最坏情况O(1)追加?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认识到,使用数组来存储它的元素(如列表℃的可调整大小的索引的集合; T> 在.NET或的ArrayList 在Java中)有摊销O(1)插入时间在集合的末尾。但总有那么的有一个的麻烦插在关键时刻,其中收集了的只是的达到它的容量和下一插入需要的所有元素的内部数组中完全复制到新的(presumably两倍大)。

I realize a resizable indexed collection that uses an array to store its elements (like List<T> in .NET or ArrayList in Java) has amortized O(1) insertion time at the end of the collection. But then there's always that one pesky insertion at critical junctures where the collection has just reached its capacity and the next insertion requires a full copy of all elements in the internal array to a new one (presumably twice as large).

一个常见的​​错误(在我看来)是一个链表去修理这个问题;但我相信分配的每一个元素节点的开销可以说是相当浪费的,而事实上也相形见绌保证Ø的好处(1)插入在罕见的情况下,该阵列插入是昂贵的 - 的时候,其实,每个其他的阵列插入是显著更便宜,速度更快。

A common mistake (in my opinion) is to go with a linked list to "fix" this problem; but I believe the overhead of allocating a node for every element can be quite wasteful, and in fact would dwarf the benefit of a guaranteed O(1) insertion in that rare case that the array insertion is costly—when, in fact, every other array insertion is significantly cheaper (and faster).

我想可能是有意义的是由阵列的链表,每一次当前的头阵列达到其容量,新阵列的两倍大被添加到链表的混合方法。然后,无可复制将是必要的,因为链表仍然有原来的数组。从本质上讲,这似乎类似于(我)到列表&LT; T&GT; 的ArrayList 的办法,但无论你previously会一直发生复制内部数组中的所有元素的成本,在这里你只承担分配的成本的新的的数组加上一个节点插入。

What I was thinking might make sense is a hybrid approach consisting of a linked list of arrays, where every time the current "head" array reaches its capacity, a new array twice as large is added to the linked list. Then no copy would be necessary since the linked list would still have the original array. Essentially this seems analogous (to me) to the List<T> or ArrayList approach, except that wherever you previously would've incurred the cost of copying all the elements of the internal array, here you only incur the cost of allocating a new array plus a single node insertion.

当然,如果他们希望的(例如,插入/从集合的中间除去进/),这将复杂化的其他特征;但正如我在标题前pssed $ P $,我真的只是寻找一个的只添加的(和迭代)的集合。

Of course, this would complicate other features if they were desired (e.g., inserting/removing into/from the middle of the collection); but as I've expressed in the title, I'm really just looking for an add-only (and iterate) collection.

是否有任何数据结构,那里非常适合这个目的是什么?或者,你可以自己想到一个?

Are there any data structures out there ideally suited to this purpose? Or, can you think of one yourself?

推荐答案

有一个美丽的结构,称为有最坏情况下可扩展的阵列O(1)插入和O(n)的内存开销(即,它是渐进媲美到一个动态数组,但为O(1)的最坏情况的插入)。关键是要采取矢量使用方法 - 提高一倍,并复制 - 但要复制懒惰。例如,假设你有四个要素像这样的一个数组:

There is a beautiful structure called an extendible array that has worst-case O(1) insertion and O(n) memory overhead (that is, it's asymptotically comparable to a dynamic array, but has O(1) worst-case insertion). The trick is to take the approach that the vector uses - doubling and copying - but to make the copying lazy. For example, suppose you have an array of four elements like this one:

[1] [2] [3] [4]

如果你想添加一个新的号码,说5,你就开始通过分配的数组的两倍大:

If you want to add a new number, say 5, you begin by allocating an array that's twice as large:

[1] [2] [3] [4]
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

接下来,您将5到新的数组:

Next, you insert 5 into the new array:

[1] [2] [3] [4]
[ ] [ ] [ ] [ ] [5] [ ] [ ] [ ]

最后,从旧的阵列拉下4到新的:

Finally, pull down the 4 from the old array into the new:

[1] [2] [3] [ ]
[ ] [ ] [ ] [4] [5] [ ] [ ] [ ]

从现在开始,你做一个插入任何时候,元素添加到新阵列和从旧阵列拉下多一个元素。例如,添加6之后,我们会得到

From now on, any time you do an insert, add the element to the new array and pull down one more element from the old array. For example, after adding 6, we'd get

[1] [2] [ ] [ ]
[ ] [ ] [3] [4] [5] [6] [ ] [ ]

插入两个值后,我们会在这里结束:

After inserting two more values, we end up here:

[ ] [ ] [ ] [ ]
[1] [2] [3] [4] [5] [6] [7] [8]

如果我们现在需要添加更多的元素,我们丢弃现在空旧数组并分配一个数组两倍大电流阵列(可容纳16个元素):

If we now need to add one more element, we discard the now-empty old array and allocate an array twice as large as the current array (capable of holding 16 elements):

[1] [2] [3] [4] [5] [6] [7] [8]
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

和重复此过程。贴现内存分配的成本(通常是在数组的大小次线性),你做的每插最多O(1)的工作。

And repeat this process. Discounting the cost of a memory allocation (which is usually sublinear in the size of the array), you do at most O(1) work per insertion.

查找仍然是O(1),因为你只是决定要看看这两个阵列中,而在中间插入为O(N),因为洗牌。

Lookups are still O(1), since you just decide which of the two arrays to look in, while insertions in the middle are O(n) because of the shuffling.

如果你好奇,我有在我的个人网站的Java实现这种结构。 我不知道你会多么有用找到它,但你以上,欢迎尝试一下。

If you're curious, I have a Java implementation of this structure on my personal site. I don't know how useful you'll find it, but you're more than welcome to try it out.

希望这有助于!

修改:如果你想投资一点时间阅读过一份研究报告,并试图实现一个相当复杂的数据结构,你可以得到相同的结果( Ø在最坏情况下O(1)追加)(拉迪奇;采用n)的空间开销(这可证明是最优的,顺便说一句)的本文的想法。我从来没有抽时间去实际执行这一点,但它肯定是很好,值得阅读,如果内存是一个超级稀缺资源。有趣的是,它使用此之上建设作为一个子程序!

EDIT: If you want to invest a bit of time reading over a research paper and trying to implement a fairly complex data structure, you can get the same result (worst-case O(1) append) in O(√n) space overhead (which is provably optimal, by the way) using the ideas in this paper. I never got around to actually implementing this, but it's certainly well-worth the read if memory is a super-scarce resource. Interestingly, it uses this above construction as a subroutine!

这篇关于数据结构支持O(1)随机访问和最坏情况O(1)追加?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆