如果我们使用链表实现桶,桶排序的复杂度是 O(n+k) 吗? [英] How is the complexity of bucket sort is O(n+k) if we implement buckets using linked lists?

查看:13
本文介绍了如果我们使用链表实现桶,桶排序的复杂度是 O(n+k) 吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我们使用链表实现的桶,我很好奇为什么桶排序的运行时间是 O(n + k).例如,假设我们有这样的输入:

I am curious about why bucket sort has a runtime of O(n + k) if we use buckets implemented with linked lists. For example, suppose that we have this input:

n = no of element= 8
k = range = 3

array = 2,2,1,1,1,3,1,3

桶看起来像这样:

1: 1 -> 1 -> 1 -> 1
2: 2 -> 2
3: 3 -> 3

插入这些桶所花费的总时间是 O(n),假设我们在链表中存储了一个尾指针.

The total time spent inserting into these buckets is O(n), assuming that we store a tail pointer in the linked lists.

要删除,我们必须转到每个存储桶,然后删除该存储桶中的每个节点.因此,当我们遍历每个链表时,复杂度应该是 O(K * 桶链表的平均长度).

For deleting we have to go to each bucket and then delete each node in that bucket. Hence complexity should be O(K * average length of link list of bucket) as we are traversing each linked list.

但是,我读到桶排序的复杂度是 O(n + k).为什么这与我的分析不一致?请纠正我,因为我仍在学习计算复杂性.

However, I read that bucket sort's complexity is O(n + k). Why doesn't this agree with my analysis? Please correct me as I am still learning computational complexity.

推荐答案

您的分析几乎是正确的,但您遗漏了一个重要的细节.

Your analysis is almost correct, but there's an important detail that you're missing.

现在,您是正确的,遍历输入数组以将元素分布到桶中需要时间 O(n).但是,当您说组装数组所需的总时间为 O(k * (每个存储桶的平均元素数)) 时,您就有点不对劲了.请注意,因为有 n 个元素和 k 个存储桶,所以总运行时间为 O(k * (n/k)) = O(n),总运行时间为 O(n).要了解为什么真正的答案是 O(n + k),我们需要更仔细地查看大 O 项.

Right now, you are correct that iterating across the input array to distribute the elements into buckets takes time O(n). However, you are slightly off when you say that the total amount of time required to assemble the array is O(k * (average number of elements per bucket)). Note that because there are n elements and k buckets, this would come out to O(k * (n / k)) = O(n), for a total runtime of O(n). To see why the real answer is O(n + k), we need to look more carefully at that big-O term.

对于初学者来说,您在每个存储桶上花费的平均时间为 O(n/k) 是绝对正确的.然后你说因为有 k 个存储桶,所以总运行时间为 O(k * (n/k)) = O(n).然而,这是不正确的:特别是,k * O(n/k) = O(n) 是正确的.这样做的原因是 O(n/k) 项隐藏了一个常数因子.当您访问每个存储桶并查看其中包含的元素时,它不需要恰好 n/k 时间,甚至不需要 n/k 时间的某个常数倍数.例如,如果桶为空会发生什么?在这种情况下,您仍然需要花一些时间查看存储桶,因为您必须确定不应该迭代它的元素.因此,每个桶所需时间的更准确表示类似于 c0(n/k) + c1,其中 c0 和c1 是特定于实现的常量.这个表达式当然是 O(n/k).

For starters, you are absolutely right that the average amount of time that you spend on each bucket is O(n / k). You then say that since there are k buckets, the total runtime is then O(k * (n / k)) = O(n). However, this is incorrect: in particular, it is not true that k * O(n / k) = O(n). The reason for this is that the term O(n / k) is hiding a constant factor. When you visit each bucket and take a look at the elements it contains, it doesn't take exactly n / k time, or even some constant multiple of n / k time. For example, what happens if the bucket is empty? In that case, you're still spending some amount of time looking at the bucket, since you have to determine that you shouldn't iterate over its elements. Thus a more accurate representation of the time required per bucket is something like c0(n / k) + c1, where c0 and c1 are implementation-specific constants. This expression is, of course, O(n / k).

问题是当您将此表达式乘以 k 以获得完成的工作总量时会发生什么.如果你计算

The catch is what happens when you multiply this expression by k to get the total amount of work done. If you compute

k * (c0(n/k) + c1)

k * (c0(n / k) + c1)

你得到

c0n + c1k

如您所见,该表达式直接取决于 k,因此总运行时间为 O(n + k).

As you can see, this expression depends directly on k, so the total runtime is O(n + k).

获得此结果的更直接方法是查看桶排序第二步的代码,如下所示:

A more direct way to arrive at this result would be to look at the code for the second step of the bucket sort, which looks like this:

For each bucket b:
    For each element x in b:
        Append x to the array.

总共完成了多少工作?好吧,有 k 个不同的桶,所以最外层的循环必须至少花费 O(k) 时间,因为我们必须查看每个桶.在内部,内部循环总共将执行 O(n) 次,因为总共有 n 个元素分布在存储桶中.由此,我们得到 O(n + k) 的总运行时间.

How much work is done overall? Well, there are k different buckets, so the outermost loop must take at least O(k) time, because we have to look in each bucket. Inside, the inner loop will execute a total of O(n) times overall, because there are a total of n elements distributed across the buckets. From this, we get the O(n + k) total runtime.

这很重要的原因是,如果您尝试使用大量存储桶(例如,远大于 n)进行存储桶排序,运行时可能会被扫描所有数据所需的时间所支配.寻找您实际使用的存储桶,即使其中大部分是空的.基数排序有用的原因是它使用桶排序的多次迭代,其中只有两个桶,其运行时间为 O(n + 2) = O(n).由于您只需要执行 O(lg U) 次迭代(其中 U 是数组中的最大值),因此运行时间为 O(n lg U) 而不是您从存储桶中获得的 O(n + U)排序,这是更糟糕的.

The reason that this is important is that it means that if you try doing a bucket sort with a huge number of buckets (say, much greater than n), the runtime might be dominated by the time required to scan over all the buckets looking for the buckets that you actually used, even if most of them are empty. The reason that radix sort is useful is that it uses multiple iterations of bucket sort where there are only two buckets, which runs in time O(n + 2) = O(n). Since you only need to do O(lg U) iterations of this (where U is the maximum value in the array), the runtime is O(n lg U) instead of the O(n + U) you'd get from bucket sort, which is much worse.

希望这有帮助!

这篇关于如果我们使用链表实现桶,桶排序的复杂度是 O(n+k) 吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆