归并排序中的比较次数 [英] Number of Comparisons in Merge-Sort

查看:154
本文介绍了归并排序中的比较次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在研究合并排序主题时遇到了这个概念,即合并排序中的比较次数(在最坏的情况下,根据 维基百科) 等于 (n ⌈lg n⌉ - 2⌈lg n⌉ + 1);事实上,它介于 (n lg n - n + 1) 和 (n lg n + n + O(lg n)) 之间.问题是我无法弄清楚这些复杂性试图说什么.我知道 O(nlogn) 是归并排序的复杂度,但比较次数?

I was studying the merge-sort subject that I ran into this concept that the number of comparisons in merge-sort (in the worst-case, and according to Wikipedia) equals (n ⌈lg n⌉ - 2⌈lg n⌉ + 1); in fact it's between (n lg n - n + 1) and (n lg n + n + O(lg n)). The problem is that I cannot figure out what these complexities try to say. I know O(nlogn) is the complexity of merge-sort but the number of comparisons?

推荐答案

为什么要计算比较

任何排序算法基本上都有两种操作:比较数据和移动数据.在许多情况下,比较比搬家更昂贵.想想基于引用的类型系统中的长字符串:移动数据将简单地交换指针,但比较可能需要在找到第一个差异之前迭代字符串的大部分公共部分.所以从这个意义上说,比较可能是重点操作.

Why to count comparisons

There are basically two operations to any sorting algorithm: comparing data and moving data. In many cases, comparing will be more expensive than moving. Think about long strings in a reference-based typing system: moving data will simply exchange pointers, but comparing might require iterating over a large common part of the strings before the first difference is found. So in this sense, comparison might well be the operation to focus on.

数字似乎更详细:不是简单地给出一些朗道符号(大哦符号)来表示复杂性,而是得到一个实际数字.一旦您确定了基本操作是什么,例如这种情况下的比较,这种实际计数操作的方法就变得可行了.这在比较由朗道符号隐藏的常数时,或在检查小输入的非渐近情况时尤为重要.

The numbers appear to be more detailed: instead of simply giving some Landau symbol (big-Oh notation) for the complexity, you get an actual number. Once you have decided what a basic operation is, like a comparison in this case, this approach of actually counting operations becomes feasible. This is particularly important when comparing the constants hidden by the Landau symbol, or when examining the non-asymptotic case of small inputs.

请注意,在整个讨论中,lg 表示以 2 为底的对数.当您对 n 元素进行合并排序时,您有 ⌈lg n⌉ 个合并级别.假设您将 ⌈lg n⌉ 硬币放在要排序的每个元素上,并且合并需要一个硬币.这肯定足以支付所有合并的费用,因为每个元素都将包含在 ⌈lg n⌉ 合并中,并且每次合并所进行的比较不会超过所涉及元素的数量.所以这是公式中的 n⌈lg n⌉.

Note that throughout this discussion, lg denotes the logarithm with base 2. When you merge-sort n elements, you have ⌈lg n⌉ levels of merges. Assume you place ⌈lg n⌉ coins on each element to be sorted, and a merge costs one coin. This will certainly be enough to pay for all the merges, as each element will be included in ⌈lg n⌉ merges, and each merge won't take more comparisons than the number of elements involved. So this is the n⌈lg n⌉ from your formula.

作为两个长度为 mn 的数组的合并只需要 m + n − 1 次比较,最后你还有硬币,每次合并一个.让我们暂时假设我们所有的数组长度都是 2 的幂,即你总是有 m = n.那么合并的总数是 n − 1(2 的幂之和).利用 n 是 2 的幂这一事实,这也可以写成 2⌈lg n − 1,然后减去该数字从所有硬币的数量中返回的硬币产量 n⌈lg n⌉ − 2⌈lg n +1 根据需要.

As a merge of two arrays of length m and n takes only m + n − 1 comparisons, you still have coins left at the end, one from each merge. Let us for the moment assume that all our array lengths are powers of two, i.e. that you always have m = n. Then the total number of merges is n − 1 (sum of powers of two). Using the fact that n is a power of two, this can also be written as 2⌈lg n − 1, and subtracting that number of returned coins from the number of all coins yields n⌈lg n⌉ − 2⌈lg n + 1 as required.

如果 n 是 1 的小于 2 的幂,则有 ⌈lg n⌉ 合并,其中涉及的元素少一个.这包括合并两个单元素列表,这些列表过去只需要一枚硬币,现在完全消失了.因此,总成本减少了⌈lg n⌉,这正是如果 n 是 2 的幂时您在最后一个元素上放置的硬币数量.所以你必须在前面放置更少的硬币,但你会得到相同数量的硬币.这就是为什么公式有 2⌈lg n 而不是 n 的原因:除非你降到更小的幂,否则值保持不变两个.如果 n 和下一个 2 的幂之间的差大于 1,则相同的论点成立.

If n is 1 less than a power of two, then there are ⌈lg n⌉ merges where one element less is involved. This includes a merge of two one-element lists which used to take one coin and which now disappears altogether. So the total cost reduces by ⌈lg n⌉, which is exactly the number of coins you'd have placed on the last element if n were a power of two. So you have to place fewer coins up front, but you get back the same number of coins. This is the reason why the formula has 2⌈lg n instead of n: the value remains the same unless you drop to a smaller power of two. The same argument holds if the difference between n and the next power of two is greater than 1.

总的来说,这导致了维基百科中给出的公式:

On the whole, this results in the formula given in Wikipedia:

n ⌈lg n⌉ − 2⌈lg n + 1

n ⌈lg n⌉ − 2⌈lg n + 1

注意:我对上述证明非常满意.对于那些喜欢我的公式的人,请随意分发它,但不要忘记将它归于我作为 许可 需要.

为了证明下界公式,让我们写 ⌈lg n⌉ = lg n + d 0 ≤ d <1.现在上面的公式可以写成
n (lg n + d) − 2lg n + d + 1 =n lg n + ndn2d + 1 =n lg nn(2dd) + 1 ≥n lg nn + 1
其中不等式成立是因为 2dd ≤ 1 for 0 ≤ d <1

To proove the lower bound formula, let's write ⌈lg n⌉ = lg n + d with 0 ≤ d < 1. Now the formula above can be written as
n (lg n + d) − 2lg n + d + 1 = n lg n + ndn2d + 1 = n lg nn(2dd) + 1 ≥ n lg nn + 1
where the inequality holds because 2dd ≤ 1 for 0 ≤ d < 1

我必须承认,我很困惑为什么有人会命名 n lg n + n + O(lg n) 作为上限.即使您想避免使用 floor 函数,上面的计算也表明 n lg n − 0.9n + 1 之类的东西作为更严格的上限为精确公式而定.2dd 有它的最小值 (ln(ln(2)) + 1)/ln(2) ≈ 0.914 for d = −ln(ln(2))/ln(2) ≈ 0.529.

I must confess, I'm rather confused why anyone would name n lg n + n + O(lg n) as an upper bound. Even if you wanted to avoid the floor function, the computation above suggests something like n lg n − 0.9n + 1 as a much tighter upper bound for the exact formula. 2dd has its minimum (ln(ln(2)) + 1)/ln(2) ≈ 0.914 for d = −ln(ln(2))/ln(2) ≈ 0.529.

我只能猜测引用的公式出现在某些出版物中,或者作为该算法的一个相当松散的界限,或者作为与该算法进行比较的其他算法的确切比较次数.

I can only guess that the quoted formula occurs in some publication, either as a rather loose bound for this algorithm, or as the exact number of comparisons for some other algorithm which is compared against this one.

此问题已通过以下评论解决;一个公式最初被错误引用.

等于 (n lg n - n + 1);实际上它介于 (n lg n - n + 1) 和 (n lg n + n + O(lg n)) 之间

equals (n lg n - n + 1); in fact it's between (n lg n - n + 1) and (n lg n + n + O(lg n))

如果第一部分是真的,第二部分也是微不足道的,但是明确说明上限似乎没有意义.我自己没有看过细节,但是这两个陈述像这样放在一起看起来很奇怪.要么第一个真的是真的,在这种情况下我会省略第二个,因为它只会令人困惑,或者第二个是真的,在这种情况下,第一个是错误的,应该省略.

If the first part is true, the second is trivially true as well, but explicitely stating the upper bound seems kind of pointless. I haven't looked at the details myself, but these two statements appear strange when taken together like this. Either the first one really is true, in which case I'd omit the second one as it is only confusing, or the second one is true, in which case the first one is wrong and should be omitted.

这篇关于归并排序中的比较次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆