我怎样才能更好地了解一个比较每一次迭代的二进制搜索? [英] How can I better understand the one-comparison-per-iteration binary search?

查看:132
本文介绍了我怎样才能更好地了解一个比较每一次迭代的二进制搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么一比较每次迭代二进制搜索的点是?而且你可以解释它是如何工作的?

What is the point of the one-comparison-per-iteration binary search? And can you explain how it works?

推荐答案

有两个原因二分查找每个迭代一次比较。该 同样重要的是性能。检测使用两个早期完全匹配 每次迭代比较节省了循环的平均一次迭代,而 (假设比较涉及显著作品)二分查找一个 每次迭代相比几乎减半每次迭代完成的工作。

There are two reasons to binary search with one comparison per iteration. The less important is performance. Detecting an exact match early using two comparisons per iteration saves an average one iteration of the loop, whereas (assuming comparisons involve significant work) binary searching with one comparison per iteration almost halves the work done per iteration.

二分查找整数数组,它可能差别不大 无论如何。即使是一个相当昂贵的比较,渐近 性能是相同的,和半,而高于减一可能是不值得 追求在大多数情况下。此外,昂贵的比较往往是codeD作为对&LT返回负数,零或正功能; == > ,这样你就可以同时获得比较为pretty的了一艘得多的价格

Binary searching an array of integers, it probably makes little difference either way. Even with a fairly expensive comparison, asymptotically the performance is the same, and the half-rather-than-minus-one probably isn't worth pursuing in most cases. Besides, expensive comparisons are often coded as functions that return negative, zero or positive for <, == or >, so you can get both comparisons for pretty much the price of one anyway.

最重要的原因做二进制搜索每个迭代一次比较是 因为你可以得到的不仅仅是某些平等匹配更为有用的结果。主要的 搜索,你可以做的是...

The important reason to do binary searches with one comparison per iteration is because you can get more useful results than just some-equal-match. The main searches you can do are...

  • 在第一个键>​​目标
  • 在第一个键>​​ =目标
  • 在第一个键==目标迈进。
  • 在最后的关键&LT;目标
  • 在最后的关键&LT; =目标
  • 在最后的关键==目标迈进。

这些都降低到相同的基本算法。理解这一点不够好 您可以code所有变易并不难,但我不 真正看到一个很好的解释 - 只有伪code和数学证明。本 我尝试解释。

These all reduce to the same basic algorithm. Understanding this well enough that you can code all the variants easily isn't that difficult, but I've not really seen a good explanation - only pseudocode and mathematical proofs. This is my attempt at an explanation.

有游戏所在的想法是让尽可能接近到目标 没有超调。更改为下冲,而这正是搜寻 首先>确实,在搜索过程中考虑的范围在某个阶段...

There are games where the idea is to get as close as possible to a target without overshooting. Change that to "undershooting", and that's what "Find First >" does. Consider the ranges at some stage during the search...

| lower bound     | goal                    | upper bound
+-----------------+-------------------------+--------------
|         Illegal | better            worse |
+-----------------+-------------------------+--------------

的电流之间的范围上下限仍需要被搜索。 我们的目标是(通常)在那里的某个地方,但我们还不知道在哪里。该 有关上述上限项有趣的一点是,他们是合法的 这个意义上,它们是大于目标。可以说,该项目刚 高于目前的上限是我们最好的那么远的解决方案。我们甚至可以这样说 在起步时,即使有可能在该位置没有任何项目 - 中 感,如果没有有效的在范围内的解决方案,具有不是最好的解决方案 被推翻是刚刚过去的上界。

The range between the current upper and lower bound still need to be searched. Our goal is (normally) in there somewhere, but we don't yet know where. The interesting point about items above the upper bound is that they are legal in the sense that they are greater than the goal. We can say that the item just above the current upper bound is our best-so-far solution. We can even say this at the very start, even though there is probably no item at that position - in a sense, if there is no valid in-range solution, the best solution that hasn't been disproved is just past the upper bound.

目前每次迭代中,我们选择一个项目的上和下限之间进行比较。 对于二进制搜索,这是一个圆形的中途项。对于二进制树搜索,这是 由树的结构所决定的。其原理是相同的两种方式。

At each iteration, we pick an item to compare between the upper and lower bound. For binary search, that's a rounded half-way item. For binary tree search, it's dictated by the structure of the tree. The principle is the same either way.

由于我们正在寻找一个项目大于我们的目标,我们的测试项目比较 使用第[testpos] GT;目标。如果结果是假,我们已经上颚(或 下颚)我们的目标,所以我们保持我们现有的最佳那么远的解决方案,并调整 我们的下界上升。如果结果是真的,我们已经找到了新的最佳那么远 解决方案,所以我们调整上限降至反映这一点。

As we are searching for an item greater-than our goal, we compare the test item using Item [testpos] > goal. If the result is false, we have overshot (or undershot) our goal, so we keep our existing best-so-far solution, and adjust our lower bound upwards. If the result is true, we have found a new best-so-far solution, so we adjust the upper bound down to reflect that.

不管怎样,我们永远不想再比较一下测试项目,所以我们调整我们 势必消除(刚刚)的范围内的测试项目进行搜索。存在 不小心用这通常会导致无限循环。

Either way, we never want to compare that test item again, so we adjust our bound to eliminate (only just) the test item from the range to search. Being careless with this usually results in infinite loops.

通常情况下,半开放范围使用 - 一个包容性的下限和专属 上界。使用这个系统,上限索引处的项目是不是在 搜索范围(至少不是现在),但它的的最好的那么远的解决方案。当你 移动下界起来,你只要将它移动到 testpos + 1 (排除的项目,你 从范围测试)。当您移动的上限了,你把它移到 testpos(上限为独占反正)。

Normally, half-open ranges are used - an inclusive lower bound and an exclusive upper bound. Using this system, the item at the upper bound index is not in the search range (at least not now), but it is the best-so-far solution. When you move the lower bound up, you move it to testpos+1 (to exclude the item you just tested from the range). When you move the upper bound down, you move it to testpos (the upper bound is exclusive anyway).

if (item[testpos] > goal)
{
  //  new best-so-far
  upperbound = testpos;
}
else
{
  lowerbound = testpos + 1;
}

当的下限和上限之间的范围内是空的(使用半开, 当两者具有相同的索引),你的结果是你最近最好的那么远 解决方案,只是上方的上限​​(即上限指数在 半开)。

When the range between the lower and upper bounds is empty (using half-open, when both have the same index), your result is your most recent best-so-far solution, just above your upper bound (ie at the upper bound index for half-open).

所以完全算法...

while (upperbound > lowerbound)
{
  testpos = lowerbound + ((upperbound-lowerbound) / 2);

  if (item[testpos] > goal)
  {
    //  new best-so-far
    upperbound = testpos;
  }
  else
  {
    lowerbound = testpos + 1;
  }
}

更改第一个关键&GT;目标第一个键&GT; =目标,你从字面上切换 比较运营商在如果行。的的相对操作者和目标可能是由一个单一的参数替换。 - 一个predicate功能,如果(且仅当)它的参数是在大于侧的目标返回true

To change from first key > goal to first key >= goal, you literally switch the comparison operator in the if line. The relative operator and goal could be replaced by a single parameter - a predicate function that returns true if (and only if) its parameter is on the greater-than side of the goal.

这是给你的第一>和第一> =。要获得第一==,用第一> =和 在退出循环之后添加相等性检查。

That gives you "first >" and "first >=". To get "first ==", use "first >=" and add an equality check after the loop exits.

有关最后的&LT;等,其原理是与上述相同,但幅度 反映。这只是意味着你掉在束缚调整(但不是 评论)以及改变操作。但在这之前,请考虑以下...

For "last <" etc, the principle is the same as above, but the range is reflected. This just means you swap over the bound-adjustments (but not the comment) as well as changing the operator. But before doing that, consider the following...

a >  b  ==  !(a <= b)
a >= b  ==  !(a <  b)

另外...

Also...

  • 位置(最后一个键&LT;目标)=位置(第一键> =目标) - 1
  • 位置(最后一个键&LT; =目标)=位置(第一键>目标) - 1

当我们在搜索过程中把我们的边界,双方正在走向球门,直到他们的目标达到。且有仅低于下界一个特殊项目,就像有刚好高于上限...

When we move our bounds during the search, both sides are being moved towards the goal until they meet at the goal. And there is a special item just below the lower bound, just as there is just above the upper bound...

while (upperbound > lowerbound)
{
  testpos = lowerbound + ((upperbound-lowerbound) / 2);

  if (item[testpos] > goal)
  {
    //  new best-so-far for first key > goal at [upperbound]
    upperbound = testpos;
  }
  else
  {
    //  new best-so-far for last key <= goal at [lowerbound - 1]
    lowerbound = testpos + 1;
  }
}

因此​​,在某种程度上,我们同时运行两个互补的搜索。当上界和下界见面,我们对单次边界每侧一个有用的搜索结果。

So in a way, we have two complementary searches running at once. When the upperbound and lowerbound meet, we have a useful search result on each side of that single boundary.

对于所有的情况下,出现的机会,一个原来的想象出越界 最好的那么远的位置是你的最终结果(没有匹配的 搜索范围)。这需要做一个最终的 == 请检查前的检查 第一==和最后==案件。这可能是有用的行为,以及 - 例如如果 您正在寻找插入你的目标项目的位置,之后将其添加 您现有的项目到底是应该做的事情,如果所有现有项目 比你的目标的项目更小。

For all cases, there's the chance that that an original "imaginary" out-of-bounds best-so-far position was your final result (there was no match in the search range). This needs to be checked before doing a final == check for the first == and last == cases. It might be useful behaviour, as well - e.g. if you're searching for the position to insert your goal item, adding it after the end of your existing items is the right thing to do if all the existing items are smaller than your goal item.

在testpos的选择一对夫妇的音符......

A couple of notes on the selection of the testpos...

testpos = lowerbound + ((upperbound-lowerbound) / 2);

首先,这将不会溢出,不同的是更明显的((下界+ 上界)/ 2)。它还可以与指针以及整 索引。

First off, this will never overflow, unlike the more obvious ((lowerbound + upperbound)/2). It also works with pointers as well as integer indexes.

二,该部门被认为本轮下跌。下舍入用于非底片 是OK(你可以肯定在C)的差值总是非负 无论如何。

Second, the division is assumed to round down. Rounding down for non-negatives is OK (all you can be sure of in C) as the difference is always non-negative anyway.

这是,如果你使用非半开,可能需要人照顾一个方面 范围,虽然 - 确保测试位置在搜索范围内,而不是只外(对已经发现的最好的那么远的位置之一)

This is one aspect that may need care if you use non-half-open ranges, though - make sure the test position is inside the search range, and not just outside (on one of the already-found best-so-far positions).

最后,在二进制树搜索,界的移动是隐含的和 选择 testpos 内置于树的结构(这可能是 非平衡),但同样的原则也适用于什么样的搜索在做什么。在这 情况下,我们选择子节点缩小隐含范围。对于第一场比赛 的情况下,无论是我们已经找到了新的更小的最佳匹配(去下孩子找到的希望更小,更好的话),或者我们已经打捞(转到更高儿童康复的希望)。再次,四个主要案件可以通过切换比较运算符处理。

Finally, in a binary tree search, the moving of bounds is implicit and the choice of testpos is built into the structure of the tree (which may be unbalanced), yet the same principles apply for what the search is doing. In this case, we choose our child node to shrink the implicit ranges. For first match cases, either we've found a new smaller best match (go to the lower child in hopes of finding an even smaller and better one) or we've overshot (go to the higher child in hopes of recovering). Again, the four main cases can be handled by switching the comparison operator.

顺便说一句 - 有更多的运营商可以使用该模板参数。考虑一个数组排序按年则一个月。也许你想找到的第一个项目特定年份。要做到这一点,写一个比较函数,今年比较,而忽略了一个月 - 目标比较平等,如果今年是平等的,但我们的目标值可以是不同类型的关键,它甚至没有一个月值比较。我认为这是一个局部键比较,并把它插入到您的二进制搜索模板,你会得到什么,我认为作为一个局部锁搜索。

BTW - there are more possible operators to use for that template parameter. Consider an array sorted by year then month. Maybe you want to find the first item for a particular year. To do this, write a comparison function that compares the year and ignores the month - the goal compares as equal if the year is equal, but the goal value may be a different type to the key that doesn't even have a month value to compare. I think of this as a "partial key comparison", and plug that into your binary search template and you get what I think of as a "partial key search".

修改习惯说1999年12月31日为等于2000年1月下面的段落。这是行不通的,除非整个范围在两者之间也被认为是相等的。问题的关键是,begin-结束范围和这三部分的日期有所不同,所以你不处理部分键,而是视为等同于搜索键必须形成一个连续的块在容器中,它通常意味着在可能的密钥的一组有序的连续块。

EDIT The paragraph below used to say "31 Dec 1999 to be equal to 1 Feb 2000". That wouldn't work unless the whole range in-between was also considered equal. The point is that all three parts of the begin- and end-of-range dates differ, so you're not deal with a "partial" key, but the keys considered equivalent for the search must form a contiguous block in the container, which will normally imply a contiguous block in the ordered set of possible keys.

这不严格只是部分键,无论是。您的自定义比较可能会考虑1999年12月31日为等于2000 1月1日,但所有其他不同的日期。问题的关键是自定义的比较必须同意对订购原始密钥,但它可能不会那么挑剔综合考虑各种不同的价值观不同 - 它可以治疗一系列的键为等价类

It's not strictly just "partial" keys, either. Your custom comparison might consider 31 Dec 1999 to be equal to 1 Jan 2000, yet all other dates different. The point is the custom comparison must agree with the original key about the ordering, but it might not be so picky about considering all different values different - it can treat a range of keys as an "equivalence class".

有关,我真的应该包括前界一个​​额外的音符,但我可能没有想过这样的时候。

An extra note about bounds that I really should have included before, but I may not have thought about it this way at the time.

考虑范围的方法之一是,他们没有的项目的索引都没有。绑定的是两个项目之间的边界线,这样你就可以在编号边界线一样容易,你可以数算项目...

One way of thinking about bounds is that they aren't item indexes at all. A bound is the boundary line between two items, so you can number the boundary lines as easily as you can number the items...

|     |     |     |     |     |     |     |     |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
| |0| | |1| | |2| | |3| | |4| | |5| | |6| | |7| |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
|     |     |     |     |     |     |     |     |
0     1     2     3     4     5     6     7     8

显然界的编号涉及的项目的编号。只要你编号左到右的边界和你编号的产品(在此情况下,从零开始),结果是有效地相同的公共半开惯例的方式相同。

Obviously the numbering of bounds is related to the numbering of the items. As long as you number your bounds left-to-right and the same way you number your items (in this case starting from zero) the result is effectively the same as the common half-open convention.

这将是可能的选择必将平分范围precisely成两个中间,但是这不是一个二进制搜索做什么。对于二进制搜索,选择一个项目,以测试 - 不绑定。该项目将在本次迭代进行测试,并决不再进行测试,因此它不包括这两个子范围。

It would be possible to select a middle bound to bisect the range precisely into two, but that's not what a binary search does. For binary search, you select an item to test - not a bound. That item will be tested in this iteration and must never be tested again, so it's excluded from both subranges.

|     |     |     |     |     |     |     |     |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
| |0| | |1| | |2| | |3| | |4| | |5| | |6| | |7| |
| +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ |
|     |     |     |     |     |     |     |     |
0     1     2     3     4     5     6     7     8
                           ^
      |<-------------------|------------->|
                           |
      |<--------------->|  |  |<--------->|
          low range        i     hi range

因此​​, testpos testpos + 1 的算法是两种情况下翻译项目索引到的结合指数。当然,如果两个边界都是平等的,有在这个范围内没有任何项目选择这样的循环无法继续,而唯一可能的结果是一个限值。

So the testpos and testpos+1 in the algorithm are the two cases of translating the item index into the bound index. Of course if the two bounds are equal, there's no items in that range to choose so the loop cannot continue, and the only possible result is that one bound value.

上面显示的范围只是仍然要搜索的范围 - 的差距,我们打算关闭该证明,较低的和成熟的,较高的范围之间。

The ranges shown above are just the ranges still to be searched - the gap we intend to close between the proven-lower and proven-higher ranges.

在此模型中,二进制搜索正在搜索两个有序种值之间的边界 - 那些归类为下和那些归类为高。在predicate测试分类一个项目。没有平等类 - 等于到键值是较高级别的一部分( X [I]&GT; =键)或更低类( X [I]&GT;关键

In this model, the binary search is searching for the boundary between two ordered kinds of values - those classed as "lower" and those classed as "higher". The predicate test classifies one item. There is no "equal" class - equal-to-key values are part of the higher class (for x[i] >= key) or the lower class (for x[i] > key).

这篇关于我怎样才能更好地了解一个比较每一次迭代的二进制搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆