算法为O(n log n)的时间和O(1)空间复杂度VS O(n)时间及O(n)的空间复杂度 [英] Algorithm with O(n log n) time and O(1) space complexity vs O(n) time and O(n) space complexity

查看:184
本文介绍了算法为O(n log n)的时间和O(1)空间复杂度VS O(n)时间及O(n)的空间复杂度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇,想知道哪种算法更好的:

  • 在算法为O(n log n)的时间和O(1)空间复杂度
  • 在算法的O(n)时间及O(n)的空间复杂度

其中大部分都解决了在澳算法(N久N)的时间和不断的空间可以在O(n)的在空间上缴纳罚款的时间来解决。哪种算法更好? 我该如何决定这两个参数之间?

例:阵列对点心

  1. 可以在O(N LOGN)时间排序来解决
  2. 可以使用哈希映射在O(n)的时间,但为O(n)的空间来解决
(!一个冒险的举动)
解决方案

没有实际测试什么,我要去声称,为O(n log n)的 - 时间,O(1) - 空间的算法大概是快于O(N) - 时间,为O(n)空间的算法,但仍可能不是最优算法。

首先,让我们来谈谈这从一个高层次的角度出发,忽略了你所描述的算法的具体细节。有一个细节要记住的是,尽管为O(n) - 时间的算法是渐近的速度比为O(n log n)的 - 时间的算法,它们只是速度更快是一个对数因素。请记住,原子在宇宙中的数量约为10 80 (感谢,物理!),基2日志原子在宇宙中的数量约为240从实用的角度来看,这意味着你可以认为,额外的O(log n)的因素,因为只是一个常数。因此,以确定是否为O(n log n)的算法会比O(n)的算法在特定的输入更快或更慢,你需要更多地了解常量隐藏在大O记法。运行在时间600N的算法会比运行在时间的2n日志n,用于适合在宇宙中,例如任意n的算法较慢。因此,在墙上的时钟性能方面,以评估其算法快,你可能需要做一些分析对算法,看看哪一个更快。

此外,还有缓存和局部性的影响。计算机内存有一个巨大的它的高速缓存,这对那里读取和写入,毗邻彼此的情况下优化的数量。高速缓存未命中的成本可能是巨大的 - 几百倍命中慢或上千 - 所以要尽量减少这一点。如果一个算法使用O(n)的内存,那么随着n变大,你需要开始担心如何密密麻麻的内存访问会。如果他们小号$ P $垫出来,那么缓存未命中的成本可能会开始迅速增加pretty的,显著驾驶起来系数隐藏在时间复杂度的大O符号。如果他们更连续的,那么你可能就不需要太担心这一点。

您还需要小心可用内存总量。如果你有你的系统内存8GB,并得到一个数组一个十亿的32位整数,然后如果你需要为O(n)与连一个合理的恒定辅助用房,你不会是能够满足您的辅助存储器到主存储器,它会开始得到换出的操作系统,真是笑死你的运行。

最后,还有随机性的问题。基于散列算法有预计的快速运行时间,但如果你得到一个不好的散列函数,有一个机会,该算法将放缓。创造良好的随机位是很难的,所以大多数哈希表只是去相当不错的哈希函数,冒着最坏情况下的输入,这将使该算法的性能退化。

那么,如何这些问题,真正发挥出在实践中?好吧,让我们来看看算法。在为O(n) - 时间,为O(n)空间算法通过构建所有元素的哈希表数组中,这样就可以很容易地检查是否给定的元素是present数组中,然后扫描过阵列,看到是否有一对总计为总。让我们来想想如何算法的工作给予上述因素。

  • 的内存使用量为O(n),而且,由于要怎么散列作品,哈希表的访问不太可能是连续的(理想的哈希表​​将有pretty的多少随机访问模式) 。这意味着你将有大量的缓存未命中。

  • 内存占用率过高意味着大投入,你不必担心内存越来越分页进出,加剧了上述问题。

  • 作为上述两个因素的结果,常数项隐藏在O(n)的运行时间很可能远高于它的外观。

  • 散列是不是最坏情况下的效率,所以有可能是输入,导致性能显著下降。

现在,想想为O(n log n)的 - 时间,O(1)空间的算法,它的工作方式是做一个就地数组排序(比如,堆排序),然后从左侧向内走,右,看到如果你能找到一对款项的目标。在这个过程中的第二步具有参考优良地方 - 几乎所有的数组访问是相邻的 - 你会得到将要在筛选工序pretty的多所有的缓存未命中。这将增加隐藏在大O符号的常数因子。然而,该算法并没有退化的投入和低内存占用可能意味着引用的位置会比哈希表的方法更好。因此,如果要我猜,我把我的钱该算法。

......嗯,其实,我把我的钱在第三算法:为O(n log n)的 - 时间,O(log n)的-space算法基本上是上面的算法,但使用introsort代替堆排序。 Introsort是为O(n log n)的 - 时间,O(log n)的使用随机快速排序大多对数组进行排序,改用堆排序,如果快速排序看起来是即将变质,并做了最后的插入排序通-space算法清理一切。快速排序具有参考惊人的地方 - 这就是为什么它是如此之快 - 和插入排序是更快的投入较小,所以这是一个很好的妥协。此外,O(log n)的额外内存基本上没有什么 - 还记得,在实践中,日志n为至多240该算法有关的参考,你可以得到,给隐藏由O非常低的常数因子的最佳地点( N日志N)项,因此它可能会优于其他算法在实践中。

当然,我有资格这个问题的答案也是如此。我做了上述分析假设,我们正在谈论的pretty的大投入的算法。如果你永远只能看着小本投入,那么这整个分析就走出了窗外,因为我考虑到影响不会开始出现。在这种情况下,最好的办法只是个人资料的方法,看看有什么效果最好。从那里,你也许可以建立在您使用一种算法对输入于一体的规模范围内,不同的算法在不同的大小范围输入的混合型的方式。机会是,这将给能够击败方法中的任何单一的方法。

这是说,套用唐克努特,谨防上述分析的 - 我还仅仅是证明了这一点正确的,没有真正尝试过最好的办法是将个人资料的一切,看看它是如何工作的。我之所以没有这样做,这是要经过哪些因素让一只眼睛和突出比较两种算法纯大O分析的弱点进行分析。我希望,实践证明了这一点!如果没有,我很乐意看到我听错了。 : - )

I am curious to know which algorithm is better :

  • Algorithm with O(n log n) time and O(1) space complexity
  • Algorithm with O(n) time and O(n) space complexity

Most of the algorithm which are solved in O(n long n) time and constant space can be solved in O(n) time by paying penalty in terms of space. Which algorithm is better ? How do I decide between these two parameters ?

Example : Array Pair Sum

  1. Can be solved in O(n logn) time by sorting
  2. Can be solved using hash maps in O(n) time but with O(n) space

解决方案

Without actually testing anything (a risky move!), I'm going to claim that the O(n log n)-time, O(1)-space algorithm is probably faster than the O(n)-time, O(n)-space algorithm, but is still probably not the optimal algorithm.

First, let's talk about this from a high-level perspective that ignores the particular details of the algorithms you're describing. One detail to keep in mind is that although O(n)-time algorithms are asymptotically faster than O(n log n)-time algorithms, they're only faster by a logarithmic factor. Keeping in mind that the number of atoms in the universe is about 1080 (thanks, physics!), the base-2 log of the number of atoms in the universe is about 240. From a practical perspective, this means that you can think of that extra O(log n) factor as just a constant. Consequently, to determine whether an O(n log n) algorithm will be faster or slower than an O(n) algorithm on a particular input, you'd need to know more about what constants are hidden by the big-O notation. An algorithm that runs in time 600n will be slower than an algorithm that runs in time 2n log n for any n that fits in the universe, for example. Therefore, in terms of wall-clock performance, to evaluate which algorithm is faster, you'd probably need to do a bit of profiling on the algorithm to see which one is faster.

Then there's the effects of caching and locality of reference. Computer memory has a huge number of caches in it that are optimized for the case where reads and writes are located next to one another. The cost of a cache miss can be huge - hundreds or thousands of times slower than a hit - so you want to try to minimize this. If an algorithm uses O(n) memory, then as n gets larger, you need to start worrying about how closely-packed your memory accesses will be. If they're spread out, then the cost of the cache misses might start to add up pretty quickly, significantly driving up the coefficient hidden in the big-O notation of the time complexity. If they're more sequential, then you probably don't need to worry too much about this.

You also need to be careful about total memory available. If you have 8GB of RAM on your system and get an array with one billion 32-bit integers, then if you need O(n) auxiliary space with even a reasonable constant, you're not going to be able to fit your auxiliary memory into main memory and it will start getting paged out by the OS, really killing your runtime.

Finally, there's the issue of randomness. Algorithms based on hashing have expected fast runtimes, but if you get a bad hash function, there's a chance that the algorithm will slow down. Generating good random bits is hard, so most hash tables just go for "reasonably good" hash functions, risking worst-case inputs that will make the algorithm's performance degenerate.

So how do these concerns actually play out in practice? Well, let's look at the algorithms. The O(n)-time, O(n)-space algorithm works by building a hash table of all the elements in the array so that you can easily check whether a given element is present in the array, then scanning over the array and seeing whether there is a pair that sums up to the total. Let's think about how this algorithm works given the factors above.

  • The memory usage is O(n) and, due to how hashing works, the accesses to the hash table are not likely to be sequential (an ideal hash table would have pretty much random access patterns). This means that you're going to have a lot of cache misses.

  • The high memory usage means that for large inputs, you have to worry about memory getting paged in and out, exacerbating the above problem.

  • As a result of the above two factors, the constant term hidden in the O(n) runtime is likely much higher than it looks.

  • Hashing is not worst-case efficient, so there may be inputs that cause performance to significantly degrade.

Now, think about the O(n log n)-time, O(1) space algorithm, which works by doing an in-place array sort (say, heapsort), then walking inwards from the left and right and seeing if you can find a pair that sums to the target. The second step in this process has excellent locality of reference - virtually all array accesses are adjacent - and pretty much all of the cache misses you're going to get are going to be in the sorting step. This will increase the constant factor hidden in the big-O notation. However, the algorithm has no degenerate inputs and its low memory footprint probably means that the locality of reference will be better than the hash table approach. Therefore, if I had to guess, I'd put my money on this algorithm.

... Well, actually, I'd put my money on a third algorithm: an O(n log n)-time, O(log n)-space algorithm that's basically the above algorithm, but using introsort instead of heapsort. Introsort is an O(n log n)-time, O(log n)-space algorithm that uses randomized quicksort to mostly sort the array, switching to heapsort if the quicksort looks like it's about to degenerate, and doing a final insertion sort pass to clean everything up. Quicksort has amazing locality of reference - this is why it's so fast - and insertion sort is faster on small inputs, so this is an excellent compromise. Plus, O(log n) extra memory is basically nothing - remember, in practice, log n is at most 240. This algorithm has about the best locality of reference that you can get, giving a very low constant factor hidden by the O(n log n) term, so it would probably outperform the other algorithms in practice.

Of course, I've got to qualify that answer as well. The analysis I did above assumes that we're talking about pretty large inputs to the algorithm. If you're only ever looking at small inputs, then this whole analysis goes out the window because the effects I was taking into account won't start to show up. In that case, the best option would just be to profile the approaches and see what works best. From there, you might be able to build a "hybrid" approach where you use one algorithm for inputs in one size range and a different algorithm for inputs in a different size range. Chances are that this would give an approach that beats any single one of the approaches.

That said, to paraphrase Don Knuth, "beware of the above analysis - I have merely proved it correct, not actually tried it." The best option would be to profile everything and see how it works. The reason I didn't do this was to go through the analysis of what factors to keep an eye out for and to highlight the weakness of a pure big-O analysis comparing the two algorithms. I hope that the practice bears this out! If not, I'd love to see where I got it wrong. :-)

这篇关于算法为O(n log n)的时间和O(1)空间复杂度VS O(n)时间及O(n)的空间复杂度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆