哪些是对&QUOT规则;Ω(N log n)的屏障QUOT;对于排序算法? [英] What are the rules for the "Ω(n log n) barrier" for sorting algorithms?

查看:164
本文介绍了哪些是对&QUOT规则;Ω(N log n)的屏障QUOT;对于排序算法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写的排序为O(n)的一个简单的程序。这是非常低效的内存,但是这不是问题的关键。

I wrote a simple program that sorts in O(n). It is highly memory inefficient, but that's not the point.

它使用的原则后面的的HashMap 进行排序:

It uses the principle behind a HashMap for sorting:

public class NLogNBreak {
    public static class LinkedListBack {
        public LinkedListBack(int val){
            first = new Node();
            first.val = val;
        }
        public Node first = null;
        public void insert(int i){
            Node n = new Node();
            n.val = i;
            n.next = first;
            first = n;
        }
    }

    private static class Node {
        public Node next = null;
        public int val;
    }

    //max > in[i] > 0
    public static LinkedListBack[] sorted(int[] in, int max){
        LinkedListBack[] ar = new LinkedListBack[max + 1];
        for (int i = 0; i < in.length; i++) {
            int val = in[i];
            if(ar[val] == null){
                ar[val] = new LinkedListBack(val);
            } else {
                ar[val].insert(val);
            }
        }
        return ar;
    }
}

那么,这算不算一个排序为O(n),即使它返回的结果中一个时髦的格式?

So does this count as a sort of O(n), even though it returns the result in a funky format?

推荐答案

要直接回答你的问题:

  1. 您的排序算法在技术上是不是O(N),而是为O(n +最大),因为你需要创建规模最大,这需要O(最大)时间的数组。
  2. 这不是一个问题;其实,这是一个众所周知的排序算法,打破了与欧米茄的特殊情况;(N log n)的屏障

那么,什么是这与欧米茄(N log n)的障碍?它从何而来?你如何打破它?

So what is this Ω(n log n) barrier? Where does it come from? And how do you break it?

&将欧米茄(N log n)的障碍是信息理论下限的任何的基于比较的的平均情况速度排序算法。如果您允许的唯一操作适用于数组元素,以区分它们是进行某种比较,那么你的排序算法不能做得比与欧米茄好。(N日志N)的平均情况

The Ω(n log n) barrier is the information-theoretical lower bound on the average-case speed of any comparison-based sorting algorithm. If the only operations you are permitted to apply to array elements to distinguish them is to perform some sort of comparison, then your sorting algorithm can't do better than Ω(n log n) in the average-case.

要理解这是为什么,让我们想想的状态的算法在其执行过程中的任何一点。随着算法的运行,它可以获取有关的方式来输入元素被勒令一些信息的数量。比方说,如果该算法有一定的一套关于输入元素的原始顺序信息的X,则算法是X国。

To understand why this is, let's think about the state of the algorithm at any point during its execution. As the algorithm is running, it can gain some amount of information about the way that the input elements were ordered. Let's say that if the algorithm has some set of information X about the original ordering of the input elements, then the algorithm is in state X.

在与欧米茄的关键;(N log n)的参数(和一些相关的参数,因为我将在后面讨论)是,该算法必须有进入一个大量基于不同状态的能力是什么输入。让我们现在假设输入到排序算法是n个不同的值的阵列。由于该算法不能告诉这些元素比他们订购的方式与其他任何东西,它并不真正的问题是什么被排序的值。所有重要的是那些n个元素相对于彼此的相对顺序。

The crux of the Ω(n log n) argument (and several related arguments, as I'll discuss later) is that the algorithm has to have the ability to get into a large number of different states based on what the input is. Let's assume for now that the input to the sorting algorithm is an array of n distinct values. Because the algorithm can't tell anything about those elements other than the way that they're ordered, it doesn't really matter what the values being sorted are. All that matters is the relative ordering of those n elements relative to one another.

现在的关键一步 - 让我们假设有是f订购n个输入元素(N)独特的方法和假设,我们的排序算法都进不了,至少F(N)不同的状态。如果是这种情况,则必须有该元件的两个不同的排序阵列中,该算法总是组在一起成为相同的状态。如果发生这种情况,那么排序算法不可能正确地正确地排序两个两个输入数组中。这样做的原因是,因为该算法将两个阵列相同,任何步骤,使用它来重新排序所述第一阵列的元素将是相同的,因为它使用重新排列第二数组的元素的步骤。由于两个数组是不一样的,必须有这将是不合时宜的,在这两种情况中的一个的至少一个元素。因此,我们知道,排序算法必须能够进入F(N)不同的状态。

Now for the key step - let's suppose that there are f(n) unique ways of ordering the n input elements and suppose that our sorting algorithm can't get into at least f(n) different states. If this is the case, then there has to be two different orderings of the elements in the array that the algorithm always groups together into the same state. If this happens, then the sorting algorithm can't possibly correctly sort both of the two input arrays correctly. The reasoning behind this is that because the algorithm treats the two arrays identically, whatever steps it uses to reorder the elements of the first array will be the same as the steps it uses to reorder the elements of the second array. Since the two arrays aren't the same, there has to be at least one element that will be out of place in one of the two cases. Consequently, we know that the sorting algorithm has to be able to get into f(n) different states.

但是,如何才能算法进入这些不同的状态?好吧,让我们想想这一点。最初,该算法没有关于元素的排序信息的。当它使得它的第一个比较(比如元素之间的[i]和A [J]),该算法可以进入两种状态 - 一个其中A [1] - ; A [J],一个其中A [I]> A [J]。更一般地,每一个比较,该算法使得能够,在最好的情况下,就把该算法到基于所述比较的结果两个新的状态之一。因此,我们可以把一个大的二进制树结构描述的状态,该算法可以在 - 每个国家最多有两个孩子,描述什么状态算法进入基于多数民众赞成做了比较的结果。如果我们从树上下到叶子的根部任何路径,我们得到了一系列最终取得过得去的算法在一个特定的输入比较。为了尽可能快地排序,我们要尽可能少的比较可能的,所以我们希望这个树状结构具有最小高度的可能。

But how can the algorithm get into these different states? Well, let's think about this. Initially, the algorithm has no information at all about the ordering of the elements. When it makes its first comparison (say, between elements A[i] and A[j]), the algorithm can get into one of two states - one where A[i] < A[j] and one where A[i] > A[j]. More generally, every comparison that the algorithm makes can, in the best case, put the algorithm into one of two new states based on the result of the comparison. We can therefore think of a large binary tree structure describing the states that the algorithm can be in - each state has up to two children describing what state the algorithm gets into based on the result of the comparison that's made. If we take any path from the root of the tree down to a leaf, we get the series of comparisons that end up getting made by the algorithm on a particular input. In order to sort as quickly as possible, we want to make the fewest number of comparisons possible, and so we want this tree structure to have the smallest height possible.

现在,我们知道两件事情。首先,我们可以把所有的州,该算法可以得到进入一个二叉树。第二,二进制树具有至少有F(N)在它的不同节点。鉴于此,最小的二叉树,我们可以建立必须有高度至少与欧米茄(登录F(N))。这意味着,如果是f订购数组元素(N)不同的可能途径,我们不得不作出至少与欧米茄(登录F(n))的平均的比较,因为否则就'吨得到足够的进州别共

Now, we know two things. First, we can think of all of the states the algorithm can get into as a binary tree. Second, that binary tree has to have at least f(n) different nodes in it. Given this, the smallest possible binary tree we can build has to have height at least Ω(log f(n)). This means that if there are f(n) different possible ways of ordering the array elements, we have to make at least Ω(log f(n)) comparisons on average, since otherwise we can't get into enough differing states.

要得出结论的证明,你不能打与欧米茄(N log n)的,请注意,如果数组有n个不同元素在里面,然后有n!订购元件的不同可能方式。使用 斯特灵公式 ,我们有一个日志N! =欧米茄(N log n)的,所以我们不得不作出至少与欧米茄;(N日志N)的比较在平均情况下进行排序输入序列

To conclude the proof that you can't beat Ω(n log n), note that if the array has n distinct elements in it, then there are n! different possible ways of ordering the elements. using Stirling's approximation, we have that log n! = Ω(n log n), and so we have to make at least Ω(n log n) comparisons in the average case to sort the input sequence.

在我们刚才看到的上面,我们已经看到,如果有n个数组元素都是不同的,你不能用一个比较它们排序排序比与欧米茄快的话;(N log n)的。然而,这种假设起始不一定是有效的。我们想排序许多阵列可能会复制在其中的元素。例如,假设我要排序的数组仅零和一所组成的,像这样的阵列的位置:

In what we just saw above, we saw that if you have n array elements that are all distinct, you cannot sort them with a comparison sort any faster than Ω(n log n). However, this starting assumption isn't necessarily valid. Many arrays that we'd like to sort may have duplicated elements in them. For example, suppose that I want to sort arrays that are composed solely of zeros and ones, such as this array here:

 0 1 0 1 1 1 0 0 1 1 1

在这种情况下,是的没有的事实,有n!不同的阵列零和长度为n的那些的。事实上,也有只有2 N 他们。从我们上面的结果,这意味着,我们应该能够欧米茄排序&;采用纯粹基于比较的排序算法(n)时间;(2日志 N )=欧米茄。事实上,我们完全可以做到这一点;这里有一个如何,我们会做一个素描:

In this case, it is not true that there are n! different arrays of zeros and ones of length n. In fact, there are only 2n of them. From our result above, this means that we should be able to sort in Ω(log 2n) = Ω(n) time using a purely comparison-based sorting algorithm. In fact, we absolutely can do this; here's a sketch of how we'd do it:

  1. 看的第一个元素。
  2. 复制小于第一元素中的所有元素融入到所谓的少
  3. 数组
  4. 复制等于第一个元素的所有元素融入到所谓的平等
  5. 数组
  6. 复制所有元素大于第一元素融入所谓的更大的
  7. 数组
  8. 在串接所有这三个阵列一起在阶数,等于,大于。
  1. Look at the first element.
  2. Copy all elements less than the first element into an array called 'less'
  3. Copy all elements equal to the first element into an array called 'equal'
  4. Copy all elements greater than the first element into an array called 'greater'
  5. Concatenate all three of these arrays together in the order less, equal, greater.

要看到这个作品,如果是0就是我们的第一个元素,那么'少'数组将是空的,在平等的阵列将所有的零和'更大'阵列将拥有所有的人。它们连接起来,然后将所有零所有的人面前。否则,如果1是我们的第一个元素,那么阵列将举行为零,则等于阵列将举办的,而大于数组将为空。因此,他们的串联是全零其次是所有的人,因为必需的。

To see that this works, if 0 is our first element, then the 'less' array will be empty, the 'equal' array will have all the zeros, and the 'greater' array will have all the ones. Concatenating them then puts all the zeros before all the ones. Otherwise, if 1 is our first element, then the less array will hold the zeros, the equal array will hold the ones, and the greater array will be empty. Their concatenation is thus all zeros followed by all ones, as required.

在实践中,你不会使用这个算法(你会使用一个计数排序,如下文所述),但它表明你的确可以战胜和欧米茄(N log n)的一个基于比较的算法,如果可能的输入到算法数是小

In practice, you wouldn't use this algorithm (you'd use a counting sort, as described below), but it shows that you can indeed beat Ω(n log n) with a comparison-based algorithm if the number of possible inputs to the algorithm is small.

一些基于比较的排序算法是众所周知的非常迅速工作,有多个重复的值输入。例如,已知的是快速排序用特殊分区步骤的能需要输入数组中重复元素的优势。

Some comparison-based sorting algorithms are known to work very quickly on inputs that have multiple duplicated values. For example, it is known that Quicksort with a special partitioning step can take advantage of duplicated elements in the input array.

所有这些讨论都认为我们在谈论基于比较的排序,那里只允许在数组元素的操作是一个比较。然而,如果我们更多地了解什么元素,我们要进行排序,并可以在超越了简单的比较,这些元素进行操作,则没有上述边界举行了。我们打​​破了出发的假设,导致我们构建算法的所有状态的二进制树,所以没有理由怀疑这些边界还是会老。

All of this discussion has assumed that we're talking about comparison-based sorting, where the only permitted operation on array elements is a comparison. However, if we know more about what elements we're going to be sorting and can perform operations on those elements beyond simple comparisons, then none of the above bounds hold any more. We're breaking the starting assumptions that led us to construct a binary tree of all the states of the algorithm, and so there's no reason to suspect that those bounds will still old.

例如,如果你知道输入值从一个宇宙,只有画| U |一次使用一个聪明的算法,它的元素,那么你就可以在O排序(| | U + N)。通过创建开始| U |不同的的进入,我们可以把从原来的数组中的元素。然后,遍历整个数组,并分发所有的数组元素成相应的水桶。最后,访问每一个水桶,从最小的元素斗持份和含有最大元素的副本斗结束,然后串联在一起你找到值。例如,让我们来看看如何排序数组由值的1 - 5。如果我们有这样的首发阵:

For example, if you know that the input values are drawn from a universe that only has |U| elements in it, then you can sort in O(n + |U|) time using a clever algorithm. Start off by creating |U| different buckets into which we can place the elements from the original array. Then, iterate across the array and distribute all of the array elements into the corresponding bucket. Finally, visit each of the buckets, starting with the bucket holding copies of the smallest element and end with the bucket containing copies of the largest element, then concatenate together all of the values you find. For example, let's see how to sort arrays consisting of the values 1 - 5. If we have this starting array:

1 3 4 5 2 3 2 1 4 3 5

然后我们就可以把这些元素融入桶这样的:

Then we can put those elements into buckets like this:

Bucket     1  2  3  4  5
           -------------
           1  2  3  4  5
           1  2  3  4  5
                 3

迭代整个水桶和连接它们的值加在一起得出这样的:

Iterating across the buckets and concatenating their values together yields this:

1 1 2 2 3 3 3 4 4 5 5

其中,果然,是我们原来的数组排序的版本!这里的运行时间为O(n)的时间去和原来的数组元素分配到水桶,然后为O(n + | U |)的时间在所有投入要素重新走到一起斗迭代。注意,如果| U | = O(N),这个运行在O(n)的时间,打破了与欧米茄;(N log n)的排序屏障

which, sure enough, is a sorted version of our original array! The runtime here is O(n) time to go and distribute the original array elements into the buckets, then O(n + |U|) time to iterate across all the buckets putting the elements back together. Notice that if |U| = O(n), this runs in O(n) time, breaking the Ω(n log n) sorting barrier.

如果要排序的整数,则可以使用基数排序做得比这更好,它运行在为O(n LG | U |)。如果你正在处理原始的 INT S,LG | U |通常是32或64,所以这是非常快的。如果你愿意来实现一个特别棘手的数据结构,您可以使用面包车昂德博阿斯树排序整数从0至U - 通过利用一个事实,即整数位组成一组,可以在块被操纵的1时间为O(n LG LG U),再次

If you are sorting integers, you can do much better than this by using radix sort, which runs in O(n lg |U|). If you're dealing with primitive ints, lg |U| is usually 32 or 64, so this is extremely fast. If you're willing to implement a particularly tricky data structure, you can use a van Emde Boas Tree to sort integers from 0 to U - 1 in time O(n lg lg U), again by exploiting the fact that integers consist of groups of bits that can be manipulated in blocks.

同样,如果你知道你的元素是字符串,可以排序很快通过建立一个线索

Similarly, if you know that your elements are strings, you can sort very quickly by building a trie out of the strings, then iterating across the trie to rebuild all the strings. Alternatively, you could consider the strings as numbers written in a large base (say, base 128 for ASCII text) and then use one of the integer sorting algorithms from above.

在每一种情况下,你能打败的信息理论障碍的原因是,你打破障碍的开头的假设,即你只能申请比较。如果你可以把输入元素,数字或字符串,或其他任何东西,发现更多的结构,全盘皆输,你可以排序非常有效。

In each of these cases, the reason that you can beat the information-theoretic barrier is that you're breaking the barrier's starting assumption, namely that you can only apply comparisons. If you can treat the input elements as numbers, or as strings, or as anything else that reveals more structure, all bets are off and you can sort extremely efficiently.

希望这有助于!

这篇关于哪些是对&QUOT规则;Ω(N log n)的屏障QUOT;对于排序算法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆