二维字符串匹配:Baker-Bird算法 [英] 2D String Matching: Baker-Bird Algorithm

查看:97
本文介绍了二维字符串匹配:Baker-Bird算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在一个巨大的矩阵中找到一个子矩阵,所以我用Google搜索并找到了Baker-Bird算法。

I want to find a submatrix in a huge matrix, so I google and find the Baker-Bird Algorithm.

但是,不幸的是,我不太了解它,有关它的教程也很少见。

But, unfortunately I cannot understand it very much, and the tutorial about it is rare.

我不能

所以我想问的是,有一些简单的示例代码或伪代码可以学习吗?

So what I want to ask is there some simple example code or pseudo code that I can study it?

Thx。

推荐答案

好,通过研究肯特·芒特·卡斯珀森(Kent Munthe Caspersen)给的链接( http://www.stringology.org/papers/Zdarek-PhD_thesis-2010.pdf 第30页),我了解了贝克-伯德算法的工作原理。

Ok, from studying the link Kent Munthe Caspersen gave ( http://www.stringology.org/papers/Zdarek-PhD_thesis-2010.pdf page 30 on), I understand how the Baker-Bird Algorithm works.

要使子矩阵出现在矩阵中,其所有列必须分别匹配。您可以向下扫描各列以查找匹配项,然后扫描此后处理的矩阵以查找表示在同一位置连续匹配的列的行。

For a submatrix to appear in a matrix, its columns must all match individually. You can scan down each column looking for matches, and then scan this post-processed matrix for rows indicating columns consecutively matching at the same spot.

说,我们正在寻找子矩阵格式

Say we are looking for submatrices of the format

a c a
b b a
c a b

我们在每一列中搜索列匹配项 abc, cba或 aab,并在新的矩阵中标记相应匹配项的结尾单元格-例如带有A,B或C的单元格。(本文中的算法要做的是构造一个状态机,该状态机根据旧的状态号转换到新的状态,然后是下一个字母,然后寻找表示我们的状态刚匹配了一个列,这更加复杂,但是效率更高,因为它只需要扫描每个列一次,而不是我们感兴趣的每个不同的列匹配一次)

We search down each column for the column-matches 'abc' 'cba' or 'aab' and in a new matrix mark the ends of those complete matches in the corresponding cell - for example with A, B or C. (What the algorithm in the paper does is construct a state machine which transitions to a new state based on the old state number and which letter comes next, and then looks for states that indicate we just matched a column, which is more complex but more efficient as it only has to scan each column once instead of once per different column match we are interested in)

完成此操作后,我们将沿着每一行进行扫描以查找指示连续列的连续值匹配-在这种情况下,我们正在矩阵行中查找字符串 ABC。如果找到它,这里就有一个子数组匹配。

Once we have done this, we scan along each row looking for successive values indicating successive columns matched - in this case, we're looking for the string 'ABC' in a matrix row. If we find it, there was a sub-array match here.

使用上述状态机方法以及选择字符串搜索算法(有很多不同的时间复杂性:(其中有很多: http://en.wikipedia.org/ wiki / String_searching_algorithm

Speedups are attained from using the state machine approach described above, and also from choice of string searching algorithm ( there are many with different time complexities: ( of which there are numerous: http://en.wikipedia.org/wiki/String_searching_algorithm )

(请注意,整个算法当然可以翻转为行首先是列,这是相同的。)

(Note that the entire algorithm can, of course, be flipped to do rows first than columns, it's identical.)

这篇关于二维字符串匹配:Baker-Bird算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆