什么是博耶 - 穆尔字符串搜索算法移位规则? [英] What are the shift rules for Boyer–Moore string search algorithm?

查看:253
本文介绍了什么是博耶 - 穆尔字符串搜索算法移位规则?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在试图理解的博耶 - 穆尔字符串搜索算法转变规则的,但没有理解他们。我读到这里维基百科但就是太复杂!

这将是很大的帮助,如果有人列出了一个简单的方式的规则。

解决方案

在博耶 - 穆尔算法,你开始比较模式字符从图案结束文本字符。如果发现不匹配,你有型的配置。

  .... XYZABC ....< -text
  .... UABC<  - 模式
      ^
    不匹配
 

现在的人品不好移的手段,转移模式,使不匹配的文本字符对准该字符的最后一次出现的图案(图案减去最后一个模式字符的初始部分),如果有这种情况发生,或一个位置之前的图案,如果不匹配的字符不显示在所述图案的​​所有的初始部分

这可能是一个左移,如果情况是

  v
... xyzazc ...
 .... uazc
 ..uazc
 

让本身并不能保证进度。

另外转移,好后缀移的,对齐文本匹配的部分, M ,与字符序列中最右边出现在是由一个不同的字符pceded $ P $(包括不含,如果​​匹配的后缀也是图案的preFIX)比匹配后缀的图案的图案。 - 如果有这样的发生

因此​​,例如

  v
.... abcdabceabcfabc ...
    ... xabcfabcfabc
        ... xabcfabcfabc
 

将导致四个位置的好后缀的转变,因为匹配部分 M = abcfabc 出现在模式的后缀,发生左四地,是$ P由不同的人物有$ pceded( X 而不是 F )比在后缀位置。

如果没有在一个不同的字符比后缀pceded模式$ P $匹配的部分不完整的发生,良好的后缀移位对齐文本用的一个preFIX匹配部分的后缀模式,选择最大重叠,如

  v
... robocab ....
    abacab
        abacab
 

好后缀转变总是将这个模式的权利,所以保证了进度。

然后,在每一个不匹配的不良字符移位和良好的后缀换档的进展进行比较,并选择较大。它是在更深入解释基督教CHARRAS和Thierry Lecroq 此处,沿与其他许多字符串搜索算法。


有关您在评论中提到的例子中,

  SSIMPLE例
例
  ^
 

匹配的后缀是 MPLE ,和不匹配的文本字符是。因此,不良性格转变会在图案的起始部分的最后出现的。答案是没有,这样坏的性格转变将转移模式使不匹配是一个模式开始之前

  SSIMPLE例
   例
 

和良好的后缀移查找 MPLE 最右边出现在模式的没有的由 A MPLE 是的模式的preFIX。有在后缀之前的图案的匹配部分不完整的发生,使匹配部分的最长后缀也是图案的preFIX确定好后缀移位。在这种情况下,匹配部分的两个后缀是图案prefixes是单字符字符串电子和空字符串。最长的是明显非空字符串,所以良好的后缀移位对齐一个字符后缀电子与的单字符preFIX的文本匹配的部分该模式

  SSIMPLE例
      例
 

好后缀换档的换档模式远权,所以这是选择的转变。<​​/ P>

再有就是立即不匹配在最后的图案位置,然后将坏的角色转变对准 P 文本P 在模式(和良好的后缀移不必在所有如果不匹配发生在最后一图案字符,因为在这种情况下,它决不会产生较大的位移比坏字符移位考虑)。

然后,我们有完整的比赛。

在与模式 TXAMPLE 的变型,良好的后缀转变认定的匹配部分没有非空后缀的格局preFIX(和还有的格局没有发生完全匹配部分的没有的由 A pceded $ P $),这样的好后缀移位对齐空后缀该文本(电子和空间之间的边界)与图案的空preFIX的匹配部分(空字符串preceding的<中code> T ),从而

  SSIMPLE例
       TXAMPLE
 

(那么在接下来的步骤中,不良性格转变对两个 s,而在转入下一个不匹配此后发生在最初的 T 的模式)。

I have been trying to understand shift rules in Boyer–Moore string search algorithm but haven't understood them. I read here on wikipedia but that is too complex !

It will be of great help if someone lists the rule in a simple manner.

解决方案

In the Boyer-Moore algorithm, you start comparing pattern characters to text characters from the end of the pattern. If you find a mismatch, you have a configuration of the type

....xyzabc....      <-text
  ....uabc          <- pattern
      ^
    mismatch

Now the bad character shift means to shift the pattern so that the text character of the mismatch is aligned to the last occurrence of that character in the initial part of the pattern (pattern minus last pattern character), if there is such an occurrence, or one position before the pattern if the mismatched character doesn't appear in the initial part of the pattern at all.

That could be a shift to the left, if the situation is

     v
...xyzazc...
 ....uazc
 ..uazc

so that alone doesn't guarantee a progress.

The other shift, the good suffix shift, aligns the matched part of the text, m, with the rightmost occurrence of that character sequence in the pattern that is preceded by a different character (including none, if the matched suffix is also a prefix of the pattern) than the matched suffix m of the pattern - if there is such an occurrence.

So for example

           v
....abcdabceabcfabc...
    ...xabcfabcfabc
        ...xabcfabcfabc

would lead to a good suffix shift of four positions, since the matched part m = abcfabc occurs in the pattern four places left of its suffix-occurrence and is preceded by a different character there (x instead of f) than in the suffix position.

If there is no complete occurrence of the matched part in the pattern preceded by a different character than the suffix, the good suffix shift aligns a suffix of the matched part of the text with a prefix of the pattern, choosing maximal overlap, e.g.

      v
...robocab....
    abacab
        abacab

The good suffix shift always shifts the pattern to the right, so guarantees progress.

Then, on every mismatch the advances of the bad character shift and the good suffix shift are compared, and the greater is chosen. It is explained in greater depth by Christian Charras and Thierry Lecroq here, along with many other string searching algorithms.


For the example you mentioned in the comments,

SSIMPLE EXAMPLE
EXAMPLE
  ^

the matched suffix is MPLE, and the mismatched text character is I. So the bad character shift looks for the last occurrence of I in the initial part of the pattern. There is none, so that bad character shift would shift the pattern so that the mismatched I is one before the start of the pattern

SSIMPLE EXAMPLE
   EXAMPLE

and the good suffix shift looks for the rightmost occurrence of MPLE in the pattern not preceded by an A, or the longest suffix of MPLE that is a prefix of the pattern. There is no complete occurrence of the matched part in the pattern before the suffix, so the longest suffix of the matched part that is also a prefix of the pattern determines the good suffix shift. In this case, the two suffixes of the matched part that are prefixes of the pattern are the single-character string E, and the empty string. The longest is obviously the nonempty string, so the good suffix shift aligns the one-character suffix E in the matched part of the text with the one-character prefix of the pattern

SSIMPLE EXAMPLE
      EXAMPLE

The good suffix shift shifts the pattern farther right, so that is the chosen shift.

Then there is an immediate mismatch at the last pattern position, and then the bad character shift aligns the P in the text with the P in the pattern (and the good suffix shift need not be considered at all if the mismatch occurs at the last pattern character, since in that case, it would never produce a larger shift than the bad character shift).

Then we have the complete match.

In the variant with the pattern TXAMPLE, the good suffix shift finds that no non-empty suffix of the matched part is a prefix of the pattern (and there is no occurrence of the complete matched part in the pattern not preceded by A), so the good suffix shift aligns the empty suffix of the matched part of the text (the boundary between the E and the space) with the empty prefix of the pattern (the empty string preceding the T), resulting in

SSIMPLE EXAMPLE
       TXAMPLE

(then in the next step, the bad character shift aligns the two Ls, and the next mismatch in the step thereafter occurs at the initial T of the pattern).

这篇关于什么是博耶 - 穆尔字符串搜索算法移位规则?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆