在排序数组中找到一个[i] = i的最有效的方法是什么? [英] What would be the most efficient way to find a[i] = i in a sorted array?

查看:157
本文介绍了在排序数组中找到一个[i] = i的最有效的方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个数组 a [] ,最有效的方法是确定是否至少有一个元素 i 满足条件 a [i] == i

Given an array a[], what would be the most efficient way to determine whether or not at least one element i satisfies the condition a[i] == i?

数组中的所有元素并且不同,但它们不一定是整数类型(即它们可能是浮点类型)。

All the elements in the array are sorted and distinct, but they aren't necessarily integer types (i.e. they might be floating point types).

推荐答案

关于排序,不同和不一定是整数的相关性的权利要求。事实上,正确选择一个高效算法来解决这个问题取决于这些特性。如果我们可以知道数组中的值是不同的和完整的,则更有效的算法是可能的,而如果值可能是不区分的,则不需要有效的算法,无论它们是否是积分的。当然,如果数组没有被排序,你可以先排序(平均复杂度为O(n log n)),然后使用更高效的预排序算法(即排序数组),但在未排序情况下,简单地保留未排序的数组并通过直接比较线性时间(O(n))中的值来更有效率。注意,不管选择的算法如何,最好的性能是O(1)(当检查的第一个元素包含其索引值);在执行任何算法期间的任何时刻,我们可能遇到一个元素,其中 a [i] == i 在这一点,我们返回true;在这个问题中,在算法性能方面实际上重要的是如何快速地排除所有元素并声明没有这样的元素 a [i] 其中

Several people have made claims about the relevance of "sorted", "distinct" and "aren't necessarily integers". In fact, proper selection of an efficient algorithm to solve this problem hinges on these characteristics. A more efficient algorithm would be possible if we could know that the values in the array were both distinct and integral, while a less efficient algorithm would be required if the values might be non-distinct, whether or not they were integral. And of course, if the array was not already sorted, you could sort it first (at average complexity O(n log n)) and then use the more efficient pre-sorted algorithm (i.e. for a sorted array), but in the unsorted case it would be more efficient to simply leave the array unsorted and run through it directly comparing the values in linear time (O(n)). Note that regardless of the algorithm chosen, best-case performance is O(1) (when the first element examined contains its index value); at any point during execution of any algorithm we might come across an element where a[i] == i at which point we return true; what actually matters in terms of algorithm performance in this problem is how quickly we can exclude all elements and declare that there is no such element a[i] where a[i] == i.

这个问题没有说明 a []的排序顺序。 / code>,这是一个非常关键的缺失信息。如果它是上升的,最坏情况的复杂性将总是O(n),没有什么可以做,使最坏情况的复杂性更好。但是如果排序顺序是递减的,即使最坏情况的复杂性是O(log n):由于数组中的值是不同的和递减的,只有一个可能的索引 a [i] code>可以等于 i ,基本上所有你需要做的是一个二叉搜索找到交叉点(其中升序索引值越过下降的元素值,如果还有这样的交叉),并且确定在交叉点索引值 c a [c] == c $ c>。由于这很简单,我会继续假设排序顺序是升序。有趣的是,如果元素是整数,即使在升序的情况下也有类似的交叉的情况(虽然在升序的情况下可能有多个 a [i] == i match),所以如果元素是整数,二叉搜索也适用于升序情况,在这种情况下,即使最坏情况的性能也是O(log n)(参见面试问题 - 在排序数组X中搜索索引i,使得X [i] = i )。

The problem does not state the sort order of a[], which is a pretty critical piece of missing information. If it’s ascending, the worst-case complexity will always be O(n), there’s nothing we can do to make the worst-case complexity better. But if the sort order is descending, even the worst-case complexity is O(log n): since values in the array are distinct and descending, there is only one possible index where a[i] could equal i, and basically all you have to do is a binary search to find the crossover point (where the ascending index values cross over the descending element values, if there even is such a crossover), and determine if a[c] == c at the crossover point index value c. Since that’s pretty trivial, I’ll proceed assuming that the sort order is ascending. Interestingly if the elements were integers, even in the ascending case there is a similar "crossover-like" situation (though in the ascending case there could be more than one a[i] == i match), so if the elements were integers, a binary search would also be applicable in the ascending case, in which case even the worst-case performance would be O(log n) (see Interview question - Search in sorted array X for index i such that X[i] = i). But we aren’t given that luxury in this version of the problem.

这是我们如何解决这个问题:

Here is how we might solve this problem:

从第一个元素 a [0] 开始。如果它的值是 == 0 ,你发现一个满足 a [i] == i 的元素返回true。如果其值为 1 ,可能包含 1 1 ,所以你继续下一个索引。然而,如果 a [0]> = 1 ,你知道(因为值是不同的)条件 a [1] == 1 不可能是真的,因此您可以安全地跳过索引 1 。但你甚至可以做得更好:例如,如果 a [0] == 12 ,你知道(因为值按升序排序),不可能是在元素 a [13] 之前满足 a [i] == i 的任何元素。因为数组中的值可以是非整数,所以我们不能在这一点做任何进一步的假设,所以我们可以安全地跳过的下一个元素是 a [13] (例如 a [1] a [12] 可能都包含 12.000之间的值。 .. 13.000 ... ,使得 a [13] 13 ,因此我们必须检查它。

Begin with the first element, a[0]. If its value is == 0, you’ve found an element which satisfies a[i] == i so return true. If its value is < 1, the next element (a[1]) could possibly contain the value 1, so you proceed to the next index. If, however, a[0] >= 1, you know (because the values are distinct) that the condition a[1] == 1 cannot possibly be true, so you can safely skip index 1. But you can even do better than that: For example, if a[0] == 12, you know (because the values are sorted in ascending order) that there cannot possibly be any elements that satisfy a[i] == i prior to element a[13]. Because the values in the array can be non-integral, we cannot make any further assumptions at this point, so the next element we can safely skip to directly is a[13] (e.g. a[1] through a[12] may all contain values between 12.000... and 13.000... such that a[13] could still equal exactly 13, so we have to check it).

继续该过程产生如下的算法:

Continuing that process yields an algorithm as follows:

// Algorithm 1
bool algorithm1(double* a, size_t len)
{
    for (size_t i=0; i<len; ++i) // worst case is O(n)
    {
        if (a[i] == i)
            return true; // of course we could also return i here (as an int)...
        if (a[i] > i)
            i = static_cast<size_t>(std::floor(a[i]));
    }
    return false; // ......in which case we’d want to return -1 here (an int)
}

如果 a [] 中的许多值大于它们的索引值,那么这具有非常好的性能,并且如果所有值 a [] 大于n(仅在一次迭代后返回false!),但是如果所有值都小于它们的索引值,在n次迭代之后)。所以我们回到绘图板...但我们需要的只是一个微小的调整。考虑到算法可能已被写入以从n向下扫描到0,就像它可以从0向n向前扫描那样容易。如果我们结合从两端到中间迭代的逻辑,我们得到如下的算法:

This has pretty good performance if many of the values in a[] are greater than their index value, and has excellent performance if all values in a[] are greater than n (it returns false after only one iteration!), but it has dismal performance if all values are less than their index value (it will return false after n iterations). So we return to the drawing board... but all we need is a slight tweak. Consider that the algorithm could have been written to scan backwards from n down to 0 just as easily as it can scan forward from 0 to n. If we combine the logic of iterating from both ends toward the middle, we get an algorithm as follows:

// Algorithm 2
bool algorithm2(double* a, size_t len)
{
    for (size_t i=0, j=len-1; i<j; ++i,--j) // worst case is still O(n)
    {
        if (a[i]==i || a[j]==j)
            return true;
        if (a[i] > i)
            i = static_cast<size_t>(std::floor(a[i]));
        if (a[j] < j)
            j = static_cast<size_t>(std::ceil(a[j]));
    }
    return false;
}

这两种极端情况下都具有出色的性能0或大于n),并且具有相当好的性能与几乎任何其他值的分布。最坏的情况是如果数组的下半部分中的所有值小于它们的索引,并且上半部分中的所有值都大于它们的索引,在这种情况下,性能降低到O的最坏情况n)。最好的情况(极端情况)是O(1),而平均情况可能是O(log n),但我推迟给某个数学专业的人确定的确定性。

This has excellent performance in both of the extreme cases (all values are less than 0 or greater than n), and has pretty good performance with pretty much any other distribution of values. The worst case is if all of the values in the lower half of the array are less than their index and all of the values in the upper half are greater than their index, in which case the performance degrades to the worst-case of O(n). Best case (either extreme case) is O(1), while average case is probably O(log n) but I’m deferring to someone with a math major to determine that with certainty.

有几个人建议对问题采用分而治之的方法,而不指定如何分割问题,以及对递归分割的子问题如何处理。当然这种不完全的答案可能不会满足面试官。上述算法2的初始线性算法和最坏情况性能都是O(n),而算法2通过跳过(不检查)元素而将平均情况性能提高到(可能)O(log n)。分而治之的方法只能超越算法2,如果在平均情况下,它是以某种方式能够跳过更多的元素,而不是算法2可以跳过。让我们假设我们通过将数组拆分成两个(几乎)相等的连续半部分,并递归地确定,并且决定是否,由于产生的子问题,我们可能跳过更多的元素而不是算法2可以跳过,特别是在算法2的最坏情况。对于本讨论的剩余部分,让我们假设输入对于算法2是最坏的情况。在第一次分裂之后,我们可以检查两个半部的顶部和底部。底部元素,导致算法2的O(1)性能,但导致O(n)性能与两个半部分组合。如果下半部分中的所有元素小于0并且上半部分中的所有元素大于n-1,则将是这种情况。在这些情况下,我们可以立即排除底部和/或上半部分的O(1)性能,我们可以排除任何一半。当然,不能被该测试排除的任何一半的性能仍然在进一步递归之后确定,再将该半除以一半,直到我们找到其顶部或底部元素包含其索引值的任何段。这是一个比算法2相当不错的性能改进,但它只发生在算法2的最坏情况下的某些特殊情况。我们所做的分裂征服是减少(略)引起最坏情况行为的问题空间的比例。仍然有最坏情况的分而治之的情况,他们完全匹配大部分的问题空间,唤起算法2的最坏情况的行为。

Several people have suggested a "divide and conquer" approach to the problem, without specifying how the problem could be divided and what one would do with the recursively divided sub-problems. Of course such an incomplete answer would probably not satisfy the interviewer. The naïve linear algorithm and worst-case performance of algorithm 2 above are both O(n), while algorithm 2 improves the average-case performance to (probably) O(log n) by skipping (not examining) elements whenever it can. The divide-and-conquer approach can only outperform algorithm 2 if, in the average case, it is somehow able to skip more elements than algorithm 2 can skip. Let’s assume we divide the problem by splitting the array into two (nearly) equal contiguous halves , recursively, and decide if, with the resulting sub-problems, we are likely to be able to skip more elements than algorithm 2 could skip, especially in algorithm 2’s worst case. For the remainder of this discussion, let’s assume an input that would be worst-case for algorithm 2. After the first split, we can check both halves’ top & bottom elements for the same extreme case that results in O(1) performance for algorithm2, yet results in O(n) performance with both halves combined. This would be the case if all elements in the bottom half are less than 0 and all elements in the upper half are greater than n-1. In these cases, we can immediately exclude the bottom and/or top half with O(1) performance for any half we can exclude. Of course the performance of any half that cannot be excluded by that test remains to be determined after recursing further, dividing that half by half again until we find any segment whose top or bottom element contains its index value. That’s a reasonably nice performance improvement over algorithm 2, but it occurs in only certain special cases of algorithm 2’s worst case. All we’ve done with divide-and-conquer is decrease (slightly) the proportion of the problem space that evokes worst-case behavior. There are still worst-case scenarios for divide-and-conquer, and they exactly match most of the problem space that evokes worst-case behavior for algorithm 2.

考虑到分治法算法具有较少的最坏情况情况,使用分治法是不是有意义?

So, given that the divide-and-conquer algorithm has less worst-case scenarios, doesn’t it make sense to go ahead and use a divide-and-conquer approach?

总之,没有。也许。如果你知道大约一半的数据小于0,一半大于n,这种特殊情况通常会更好地与分而治之的方法。或者,如果您的系统是多核并且您的n很大,则可能有助于在所有核心之间平均分配问题,但一旦它们在它们之间拆分,我认为每个核心上的子问题可能是最好的解决了上面的算法2,避免进一步划分问题,并且肯定避免递归,正如我在下面讨论的。

In a word, no. Well, maybe. If you know up front that about half of your data is less than 0 and half is greater than n, this special case would generally fare better with the divide-and-conquer approach. Or, if your system is multicore and your ‘n’ is large, it might be helpful to split the problem evenly between all of your cores, but once it’s split between them, I maintain that the sub-problems on each core are probably best solved with algorithm 2 above, avoiding further division of the problem and certainly avoiding recursion, as I argue below....

在递归分频的每个递归级别 - 算法方法,该算法需要一些方法来记住尚未解决的问题的第二半,而它递归到上半场。通常这是通过让算法递归地调用自身一半,然后对另一个,一个设计,隐式地保持在运行时栈上的设计。另一个实现可以通过在显式堆栈上保持基本上相同的信息来避免递归函数调用。在空间增长方面,算法2是O(1),但是由于必须在某种堆栈上维护该信息,任何递归实现都不可避免地是O(log n)。但是除了空间问题,递归实现具有额外的运行时开销,用于记住尚未到达子问题半部分的状态,直到它们可以被递归。这个运行时开销不是免费的,并且考虑到上面的算法2的实现的简单性,我认为这种开销是成比例的重要。因此,我建议上面的算法2将彻底捣毁绝大多数情况下的任何递归实现。

At each recursion level of a recursive divide-and-conquer approach, the algorithm needs some way to remember the as-yet-unsolved 2nd half of the problem while it recurses into the 1st half. Often this is done by having the algorithm recursively call itself first for one half and then for the other, a design which maintains this information implicitly on the runtime stack. Another implementation might avoid recursive function calls by maintaining essentially this same information on an explicit stack. In terms of space growth, algorithm 2 is O(1), but any recursive implementation is unavoidably O(log n) due to having to maintain this information on some sort of stack. But aside from the space issue, a recursive implementation has extra runtime overhead of remembering the state of as-yet-unrecursed-into subproblem halves until such time as they can be recursed into. This runtime overhead is not free, and given the simplicity of algorithm 2’s implementation above, I posit that such overhead is proportionally significant. Therefore I suggest that algorithm 2 above will roundly spank any recursive implementation for the vast majority of cases.

这篇关于在排序数组中找到一个[i] = i的最有效的方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆