如何在 O(n) 时间内对双向链表进行二分搜索? [英] How is it possible to do binary search on a doubly-linked list in O(n) time?

查看:34
本文介绍了如何在 O(n) 时间内对双向链表进行二分搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我听说可以在 O(n) 时间内对双向链表进行二分查找.访问双向链表的随机元素需要 O(n) 时间,而二分查找访问 O(log n) 个不同元素,那么运行时间不应该是 O(n log n) 吗?

解决方案

从技术上讲,双向链表二分查找的运行时间是 O(n log n) 是正确的,但这并不是一个严格的上限.使用稍微更好的二分搜索实现和更聪明的分析,可以让二分搜索在 O(n) 时间内运行.

二分查找的基本思想如下:

  • 如果列表为空,则正在搜索的元素不存在.
  • 否则:
    • 看中间的元素.
    • 如果它与相关元素匹配,则返回它.
    • 如果它大于相关元素,则丢弃列表的后半部分.
    • 如果它小于相关元素,则丢弃列表的前半部分.

一个简单的双向链表二分搜索实现是通过计算索引来查找每次迭代(就像在数组情况下一样),然后通过从列表的前面开始并扫描来访问每个索引前进适当数量的步骤.这确实很慢.如果要搜索的元素在数组的最后,则查找的索引将是 n/2、3n/4、7n/8 等.总结在最坏情况下所做的工作,我们得到

<块引用>

n/2 + 3n/4 + 7n/8 + 15n/16 + ... (Θ(log n) 项)

≥n/2 + n/2 + ... + n/2 (Θ(log n) 项)

= Θ(n log n)

<块引用>

n/2 + 3n/4 + 7n/8 + 15n/16 + ... (Θ(log n) 项)

≤n + n + ... + n (Θ(log n) 个项)

= Θ(n log n)

因此,该算法的最坏情况时间复杂度为 Θ(n log n).

但是,我们可以通过更聪明的方法将其加速 θ(log n).前面算法慢的原因是每次我们需要查找一个元素时,我们从数组的开头开始查找.但是,我们不需要这样做.在第一次查找中间元素后,我们已经在数组的中间,并且我们知道我们将要进行的下一次查找将在位置 n/4 或 3n/4,这只是与我们离开的地方的距离 n/4(与 n/4 或 3n/4 相比,如果我们从数组的开头开始).如果我们只是从我们的停止位置 (n/2) 跋涉到下一个位置,而不是从列表的前面重新开始呢?

这是我们的新算法.首先扫描到数组的中间,这需要 n/2 步.然后,判断是访问数组前半部分中间的元素还是数组后半部分中间的元素.从位置 n/2 到达那里只需要总共 n/4 步.从那里到包含该元素的数组的四分之一的中点只需 n/8 步,从那里到包含该元素的数组的八分之一的中点只需 n/16 步,等等.这意味着总步数为

<块引用>

n/2 + n/4 + n/8 + n/16 + ...

= n (1/2 + 1/4 + 1/8 + ...)

这源于无限几何级数 1/2 + 1/4 + 1/8 + ... 的总和为 1.因此,在最坏情况下所做的总功仅 Θ(n),这比之前的 Θ(n log n) 最坏情况要好得多.

最后一个细节:你为什么要这样做? 毕竟,在双向链表中搜索一个元素已经花费了 O(n) 的时间.这种方法的一个主要优点是,即使运行时间是 O(n),我们最终也只进行了 O(log n) 次总比较(二分搜索的每个步骤一个).这意味着如果比较代价高昂,我们最终可能会使用二分搜索比使用普通线性搜索做更少的工作,因为 O(n) 来自完成遍历列表的工作,而不是完成比较的工作.

I've heard that it's possible to implement binary search over a doubly-linked list in O(n) time. Accessing a random element of a doubly-linked list takes O(n) time, and binary search accesses O(log n) different elements, so shouldn't the runtime be O(n log n) instead?

解决方案

It's technically correct to say that the runtime of binary search on a doubly-linked list is O(n log n), but that's not a tight upper bound. Using a slightly better implementation of binary search and a more clever analysis, it's possible to get binary search to run in time O(n).

The basic idea behind binary search is the following:

  • If the list is empty, the element being searched for doesn't exist.
  • Otherwise:
    • Look at the middle element.
    • If it matches the element in question, return it.
    • If it's bigger than the element in question, discard the back half of the list.
    • If it's smaller than the element in question, discard the front half of the list.

A naive implementation of binary search on a doubly-linked list would work by computing the indices to look up on each iteration (just like in the array case), then access each one by starting at the front of the list and scanning forward the appropriate number of steps. This is indeed very slow. If the element being searched for is at the very end of the array, the indices looked up would be n/2, 3n/4, 7n/8, etc. Summing up the work done in the worst case, we get

n / 2 + 3n/4 + 7n/8 + 15n/16 + ... (Θ(log n) terms)

≥ n / 2 + n / 2 + ... + n / 2 (Θ(log n) terms)

= Θ(n log n)

and

n / 2 + 3n/4 + 7n/8 + 15n/16 + ... (Θ(log n) terms)

≤ n + n + ... + n (Θ(log n) terms)

= Θ(n log n)

Therefore, the worst-case time complexity for this algorithm is Θ(n log n).

However, we can speed this up by a factor of Θ(log n) by being more clever with our approach. The reason the previous algorithm is slow is that every time we need to look up an element, we start the search from the beginning of the array. However, we don't need to do this. After looking up the middle element the first time, we're already in the middle of the array, and we know that the next lookup we're going to make will be either at position n / 4 or 3n / 4, which is only distance n / 4 from where we left off (compared to n / 4 or 3n / 4 if we start from the beginning of the array). What if we just trekked out from our stopping position (n / 2) to the next position, rather than restarting at the front of the list?

Here's our new algorithm. Begin by scanning to the middle of the array, which requires n / 2 steps. Then, determine whether to visit the element at the middle of the first half of the array or the middle of the second half of the array. Reaching there from position n / 2 only requires n / 4 total steps. From there, going to the midpoint of the quarter of the array containing the element takes only n / 8 steps, and going from there to the midpoint of the eighth of the array containing the element takes only n / 16 steps, etc. This means that the total number of steps made is given by

n / 2 + n / 4 + n / 8 + n / 16 + ...

= n (1/2 + 1/4 + 1/8 + ...)

≤ n

This follows from the fact that the sum of the infinite geometric series 1/2 + 1/4 + 1/8 + ... is 1. Therefore, the total work done in the worst case only Θ(n), which is much better than the Θ(n log n) worst case from before.

One last detail: why would you ever do this? After all, it already takes O(n) time to search a doubly-linked list for an element. One major advantage of this approach is that even though the runtime is O(n), we only end up doing O(log n) total comparisons (one per step of the binary search). This means that if comparisons are expensive, we might end up doing less work using a binary search than doing a normal linear search, since the O(n) comes from the work done walking the list rather than the work done making comparisons.

这篇关于如何在 O(n) 时间内对双向链表进行二分搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆