在合并排序比较数 [英] Number of Comparisons in Merge-Sort

查看:140
本文介绍了在合并排序比较数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究合并排序题目,我就遇到了这个概念,在合并排序比较(在最坏情况下的数量,并根据的维基百科)为(N⌈lgn⌉ - 2 ⌈lgn⌉ + 1);实际上它是(N LG N - N + 1)之间和(n LG N + N + O(LG N))。现在的问题是,我不能找出这些复杂试着说。我知道O(nlogn)是的合并排序的复杂性,但比较次数?

I was studying the merge-sort subject that I ran into this concept that the number of comparisons in merge-sort (in the worst-case, and according to Wikipedia) equals (n ⌈lg n⌉ - 2⌈lg n⌉ + 1); in fact it's between (n lg n - n + 1) and (n lg n + n + O(lg n)). The problem is that I cannot figure out what these complexities try to say. I know O(nlogn) is the complexity of merge-sort but the number of comparisons?

推荐答案

基本上有两种操作任何排序算法:比较数据和移动数据。在许多情况下,在比较将比移动更加昂贵。想想长字符串的参考分型系统:移动数据将只是交换指针,但比较可能需要遍历字符串的大型公共部分的第一个差别在于前。所以在这个意义上说,比较可能不失为操作集中于

Why to count comparisons

There are basically two operations to any sorting algorithm: comparing data and moving data. In many cases, comparing will be more expensive than moving. Think about long strings in a reference-based typing system: moving data will simply exchange pointers, but comparing might require iterating over a large common part of the strings before the first difference is found. So in this sense, comparison might well be the operation to focus on.

数字显得更细致:而不是简单地提供一些朗符号(大哦符号)的复杂性,你会得到一个实际数目。一旦决定的基本操作是什么,像在这种情况下进行比较,实际计数操作的这种方法变得可行。比较隐藏在朗道符号的常数时,或检查的小输入的非渐进情况时,这一点尤为重要。

The numbers appear to be more detailed: instead of simply giving some Landau symbol (big-Oh notation) for the complexity, you get an actual number. Once you have decided what a basic operation is, like a comparison in this case, this approach of actually counting operations becomes feasible. This is particularly important when comparing the constants hidden by the Landau symbol, or when examining the non-asymptotic case of small inputs.

请注意,在整个讨论中,LG表示带底座2.如果合并排序对数 N 的元素,你有⌈lg N 的⌉级别合并的。假设你把⌈lg每个元素的 N ⌉硬币进行排序,以及合并成本一枚硬币。这肯定会足以支付所有的合并,因为每个元素将被包含在⌈lg N ⌉合并,以及每个合并将不会超过所涉及的元素数量更多的比较。因此,这是 N ⌈lg N 的⌉从您的公式。

Note that throughout this discussion, lg denotes the logarithm with base 2. When you merge-sort n elements, you have ⌈lg n⌉ levels of merges. Assume you place ⌈lg n⌉ coins on each element to be sorted, and a merge costs one coin. This will certainly be enough to pay for all the merges, as each element will be included in ⌈lg n⌉ merges, and each merge won't take more comparisons than the number of elements involved. So this is the n⌈lg n⌉ from your formula.

由于长度的两个数组合并 M N 仅需 M + N - 1比较,你仍然有留底,分别来自合并硬币。让我们暂时假设所有的数组的长度是两个,即权力,你总是有 M = N 。然后合并的总数为 N - 1(2的幂次方之和)。使用的 N 的是二的幂,这也可以写为2 ⌈lg名词的⌉功能的事实 - 1,并减去该号码从所有的硬币产量数返回硬币的 N ⌈lg N ⌉ - 2 ⌈lg N + 1根据需要

As a merge of two arrays of length m and n takes only m + n − 1 comparisons, you still have coins left at the end, one from each merge. Let us for the moment assume that all our array lengths are powers of two, i.e. that you always have m = n. Then the total number of merges is n − 1 (sum of powers of two). Using the fact that n is a power of two, this can also be written as 2⌈lg n − 1, and subtracting that number of returned coins from the number of all coins yields n⌈lg n⌉ − 2⌈lg n + 1 as required.

如果 N 是比2的功率小于1,则有⌈lg N 的⌉合并,其中一个元素较少涉及。这包括两个一元列出了过去需要一枚硬币,以及将合并现在完全消失。因此,总成本⌈lg减少 N 的⌉,这正是你所放置的最后一个元素上的硬币的数目,如果 N 是二的幂。所以,你必须把硬币少了前面,但你回来的硬币数相同。这就是为什么该公式有2 ⌈lg N 而不是 N :除非你降低到一个较小的功率值保持不变二。同样的论点成立的话之间的差别 N 和2的下一个功率大于1。

If n is 1 less than a power of two, then there are ⌈lg n⌉ merges where one element less is involved. This includes a merge of two one-element lists which used to take one coin and which now disappears altogether. So the total cost reduces by ⌈lg n⌉, which is exactly the number of coins you'd have placed on the last element if n were a power of two. So you have to place fewer coins up front, but you get back the same number of coins. This is the reason why the formula has 2⌈lg n instead of n: the value remains the same unless you drop to a smaller power of two. The same argument holds if the difference between n and the next power of two is greater than 1.

在整体,这导致维基百科给出的公式:

On the whole, this results in the formula given in Wikipedia:

N ⌈lg N ⌉ - 2 ⌈lg N + 1

n ⌈lg n⌉ − 2⌈lg n + 1

注:我是pretty的快乐与上述证明。对于那些谁喜欢我的配方,随意分发,但不要忘了把它归因于我如许可证需要。

要坡口下界公式,让我们写⌈lg N ⌉= LG电子 N + ð替换0≤ð< 1.现在上面的公式可以写成
N (LG N + ð) - 2 LG N + ð + 1 = N LG N + 第二 - N 2 ð + 1 = N LG N - N (2 ð - ð)+ 1≥ N LG N - N + 1
其中,不等式成立,因为2 ð - ð≤1 0≤ð< 1

To proove the lower bound formula, let's write ⌈lg n⌉ = lg n + d with 0 ≤ d < 1. Now the formula above can be written as
n (lg n + d) − 2lg n + d + 1 = n lg n + ndn2d + 1 = n lg nn(2dd) + 1 ≥ n lg nn + 1
where the inequality holds because 2dd ≤ 1 for 0 ≤ d < 1

我必须承认,我比较混乱,为什么有人会说出 N LG N + N + O(LG电子名词的),其为上限。即使你想避免地板函数,计算上面提出像 N LG N - 0.9 N + 1为一个更紧密的上开往精确公式。 2 ð - ð的具有最小(LN(LN(2))+ 1)/ LN(2)≈0.914 ð = -ln(LN(2))/ LN(2)≈0.529。

I must confess, I'm rather confused why anyone would name n lg n + n + O(lg n) as an upper bound. Even if you wanted to avoid the floor function, the computation above suggests something like n lg n − 0.9n + 1 as a much tighter upper bound for the exact formula. 2dd has its minimum (ln(ln(2)) + 1)/ln(2) ≈ 0.914 for d = −ln(ln(2))/ln(2) ≈ 0.529.

我只能猜测,引用的公式出现在一些出版物,无论是作为一个相当宽松开往这个算法,或者是比较反对这个其他一些算法进行比较的确切人数。

I can only guess that the quoted formula occurs in some publication, either as a rather loose bound for this algorithm, or as the exact number of comparisons for some other algorithm which is compared against this one.

此问题已得到解决由下面的评论;一式中,原本被引述有误。

为(N LG N - N + 1);实际上它是(N LG N - N + 1)之间和(n LG N + N + O(LG N))

equals (n lg n - n + 1); in fact it's between (n lg n - n + 1) and (n lg n + n + O(lg n))

如果第一部分是真实的,第二个是平凡也是如此,但明确地指出了上限,似乎一种毫无意义的。我没有看过的细节我自己,但是当这样加在一起这两个语句出现奇怪。任第一个确实是真的,在这种情况下我省略第二之一,因为它是唯一的混乱,或第二个是真实的,在这种情况下,第一个是错误的,应当被省略。

If the first part is true, the second is trivially true as well, but explicitely stating the upper bound seems kind of pointless. I haven't looked at the details myself, but these two statements appear strange when taken together like this. Either the first one really is true, in which case I'd omit the second one as it is only confusing, or the second one is true, in which case the first one is wrong and should be omitted.

这篇关于在合并排序比较数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆