使用向量类实现堆栈的链表和动态数组 [英] Linked list vs dynamic array for implementing a stack using vector class

查看:204
本文介绍了使用向量类实现堆栈的链表和动态数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在阅读两种不同的实现堆栈的方式:链表和动态数组。链表对动态数组的主要优点是链表不必调整大小,而如果插入了太多的元素,就不得不调整动态数组的大小,从而浪费大量的时间和内存。

I was reading up on the two different ways of implementing a stack: linked list and dynamic arrays. The main advantage of a linked list over a dynamic array was that the linked list did not have to be resized while a dynamic array had to be resized if too many elements were inserted hence wasting alot of time and memory.

这让我想知道这是否适用于C ++(因为有一个向量类,当插入新元素时会自动调整大小)?

That got me wondering if this is true for C++ (as there is a vector class which automatically resizes whenever new elements are inserted)?

推荐答案

很难比较两者,因为它们的内存使用模式有很大不同。

It's difficult to compare the two, because the patterns of their memory usage are quite different.

向量调整大小

向量根据需要动态调整大小。它通过分配一个新的内存块,将数据从旧块移动(或复制)到新块,释放旧块。在典型的情况下,新的块是旧的大小的1.5倍(与普遍的信念相反,2x在实践中似乎是相当不寻常的)。这意味着在重新分配的短时间内,它需要的内存大约是你实际存储的数据的2.5倍。其余时间,正在使用的块至少为2/3 rds full,最大值为完全满。如果所有尺寸同样可能,我们可以期望它平均约5/6 ths 满。从另一个方向看,我们可以预期约1/6 th ,或者在任何给定时间浪费的空间的约17%。

A vector resizes itself dynamically as needed. It does that by allocating a new chunk of memory, moving (or copying) data from the old chunk to the new chunk, the releasing the old one. In a typical case, the new chunk is 1.5x the size of the old (contrary to popular belief, 2x seems to be quite unusual in practice). That means for a short time while reallocating, it needs memory equal to roughly 2.5x as much as the data you're actually storing. The rest of the time, the "chunk" that's in use is a minimum of 2/3rds full, and a maximum of completely full. If all sizes are equally likely, we can expect it to average about 5/6ths full. Looking at it from the other direction, we can expect about 1/6th, or about 17% of the space to be "wasted" at any given time.

当我们像这样用常量因子调整大小时(而不是总是添加特定大小的块,例如以4Kb的增量增长),我们得到所谓的摊销恒定时间加法。换句话说,随着数组增长,调整大小发生指数减少。数组中已复制项目的平均次数趋于一个常数(通常为3左右,但取决于您使用的增长因素)。

When we do resize by a constant factor like that (rather than, for example, always adding a specific size of chunk, such as growing in 4Kb increments) we get what's called amortized constant time addition. In other words, as the array grows, resizing happens exponentially less often. The average number of times items in the array have been copied tends to a constant (usually around 3, but depends on the growth factor you use).

linked列表分配

使用链接列表,情况有很大不同。我们从未看到调整大小,所以我们没有看到额外的时间或内存使用的一些插入。同时,我们 看到额外的时间和内存基本上使用全部的时间。特别地,链表中的每个节点需要包含指向下一个节点的指针。根据节点中数据的大小与指针的大小相比,这可能导致显着的开销。例如,让我们假设你需要一个 int 的堆栈。在一个典型的情况下, int 与指针的大小相同,这意味着50%的开销 - 所有的时间。指针比 int 更大越来越常见;两倍的大小是相当普遍的(64位指针,32位int)。在这种情况下,你有大约67%的开销 - 即,显然足够,每个节点的指针是存储数据的两倍的空间。

Using a linked list, the situation is rather different. We never see resizing, so we don't see extra time or memory usage for some insertions. At the same time, we do see extra time and memory used essentially all the time. In particular, each node in the linked list needs to contain a pointer to the next node. Depending on the size of the data in the node compared to the size of a pointer, this can lead to significant overhead. For example, let's assume you need a stack of ints. In a typical case where an int is the same size as a pointer, that's going to mean 50% overhead -- all the time. It's increasingly common for a pointer to be larger than an int; twice the size is fairly common (64-bit pointer, 32-bit int). In such a case, you have ~67% overhead -- i.e., obviously enough, each node devoting twice as much space to the pointer as the data being stored.

不幸的是,这通常只是冰山一角。在典型的链表中,每个节点是单独动态分配的。至少如果你存储小数据项(例如 int ),为一个节点分配的内存可能(通常会是)甚至大于你实际请求的数量。所以,你要求12字节的内存来保存一个int和一个指针 - 但是你得到的内存块可能被舍入到16或32字节。现在你看到的开销至少为75%,很可能是〜88%。

Unfortunately, that's often just the tip of the iceberg. In a typical linked list, each node is dynamically allocated individually. At least if you're storing small data items (such as int) the memory allocated for a node may be (usually will be) even larger than the amount you actually request. So -- you ask for 12 bytes of memory to hold an int and a pointer -- but the chunk of memory you get is likely to be rounded up to 16 or 32 bytes instead. Now you're looking at overhead of at least 75% and quite possibly ~88%.

就速度而言,情况是相当类似的:分配和释放内存动态往往相当缓慢。堆管理器通常具有可用内存块,并且必须花费时间通过它们搜索以找到最适合于您要求的大小的块。然后,它(通常)必须将该块分为两部分,一个满足您的分配,另一个剩余的内存,它可以用来满足其他分配。同样,当你释放内存时,它通常会返回到相同的空闲块列表,并检查是否有一个相邻的内存块已经释放,所以它可以连接两个。

As far as speed goes, the situation is rather similar: allocating and freeing memory dynamically is often quite slow. The heap manager typically has blocks of free memory, and has to spend time searching through them to find the block that's most suited to the size you're asking for. Then it (typically) has to split that block into two pieces, one to satisfy your allocation, and another of the remaining memory it can use to satisfy other allocations. Likewise, when you free memory, it typically goes back to that same list of free blocks and checks whether there's an adjoining block of memory already free, so it can join the two back together.

分配和管理大量内存块是昂贵的。

Allocating and managing lots of blocks of memory is expensive.

缓存使用

最后,对于最近的处理器,我们遇到了另一个重要因素:缓存使用。在向量的情况下,我们有所有的数据紧挨着。然后,在使用的矢量的一部分结束后,我们有一些空的内存。这导致优秀的缓存使用 - 我们使用的数据被缓存;我们不使用的数据对缓存几乎没有影响。

Finally, with recent processors we run into another important factor: cache usage. In the case of a vector, we have all the data right next to each other. Then, after the end of the part of the vector that's in use, we have some empty memory. This leads to excellent cache usage -- the data we're using gets cached; the data we're not using has little or no effect on the cache at all.

使用链表,指针(和每个节点中可能的开销)分布在我们的列表。也就是说,我们关心的每一条数据都有指针的开销,以及分配给我们未使用的节点的空闲空间。简言之,高速缓存的有效大小减少了与列表中每个节点的总开销大约相同的因子 - 即,我们可以容易地看到仅1/8 th 存储我们关心的日期,以及7/8 ths 专用于存储指针和/或纯垃圾。

With a linked list, the pointers (and probable overhead in each node) are distributed throughout our list. I.e., each piece of data we care about has, right next to it, the overhead of the pointer, and the empty space allocated to the node that we're not using. In short, the effective size of the cache is reduced by about the same factor as the overall overhead of each node in the list -- i.e., we might easily see only 1/8th of the cache storing the date we care about, and 7/8ths devoted to storing pointers and/or pure garbage.

摘要

当您拥有相对较少数量的节点时,链接列表可以很好地工作,每个节点个数都相当大。如果(对于堆栈更典型),你正在处理相对大量的项目,每个项目个别相当小,你不太可能看到节省时间或内存使用情况。恰恰相反,对于这种情况,链表更可能基本上浪费大量的时间和记忆。

A linked list can work well when you have a relatively small number of nodes, each of which is individually quite large. If (as is more typical for a stack) you're dealing with a relatively large number of items, each of which is individually quite small, you're much less likely to see a savings in time or memory usage. Quite the contrary, for such cases, a linked list is much more likely to basically waste a great deal of both time and memory.

这篇关于使用向量类实现堆栈的链表和动态数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆