最长的回文子字符串和后缀特里 [英] Longest palindromic substring and suffix trie

查看:195
本文介绍了最长的回文子字符串和后缀特里的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Googling的一个相当有名的问题,即:最长的回文子串

我发现链接推荐后缀尝试为很好的解决问题。

示例 SO Algos

该方法是(据我所知)例如对于一个字符串 S 创建 Sr (这是 S )然后创建一个广义的后缀特里。

然后找到最长的公共sustring的 S Sr 这是从根到最深节点的路径,它既属于 S Sr

所以使用后缀尝试方法的解决方法基本上减少到找到最长的公共子串问题。

我的问题是: >
如果输入的字符串是: S =abacdfgdcaba so, Sr =abacdgfdcaba最长的公共子串是 abacd 这是一个回文。

所以我的问题是:使用后缀尝试的方法错误?我是在这里误解/误读吗?

解决方案

是的,通过使用LCS像算法找到最长的回文不是一个好办法,仔细阅读引用的答案,但答案中的这一行是完全错误的:


所以字符串中最长的回文正好是最长的这个字符串的公共子字符串及其相反的


但是如果你读了它,你有一个柜台示例,不要担心(你是正确的99%),这是常见的错误,但简单的方法如下:



记下字符串( barbapapa )如下:#b#a#r#b#a#p#a#p#a#,现在将这个新字符串的每个字符从左到左右,检查它的左右是否检查是否是回文中心。在最坏的情况下,该算法是O(n ^ 2),并且工作完全正确。但通常会在O(n)中发现回文(肯定证明这一点在平均情况下是困难的)。最糟糕的情况是字符串与太多长的回文像 aaaaaa ... aaaa



但是有更好的方法这需要O(n)时间,该算法的基础是 Manacher 。相关算法比我在参考答案中看到的更为复杂。但是我提供的是Manacher算法的基本思想,通过巧妙地改变算法,您可以跳过检查所有的左边和右边的权限(也有使用后缀树的算法)。






PS:由于我的互联网限制,我看不到您的Algo链接,我不知道它是否正确。



我添加了OP的讨论,以澄清算法:

 让它用barbapapa-> #b#a#r#b#a#p#a#p#a#,从头开始#
没有左,所以它是长度为1的回文中心。
现在b #在左边和#在右边,但没有另一个左边匹配右
,所以它是一个长度为3的回文中心。
允许跳过其他部分到达第一个p:
左,右是第二左,右是a,第三左,
右是#,但左右不平等,所以它是中心的回文
的长度7 #a #p#a#是回文,但b#a#p#a#p不是
现在先看看第一个p,然后在#a#p#a#p#a#作为回文和这个a
是这个回文的中心,如果您计算所有其他回文
,其长度小于11

另外使用是因为考虑到长度的回文。



在新创建的字符串中找到回文中心后,找到相关的回文(通过了解中心和它的长度),然后删除以找出最大的回文。


I was Googling about a rather well-known problem, namely: the longest palindromic substring
I have found links that recommend suffix tries as a good solution to the problem.
Example SO and Algos
The approach is (as I understand it) e.g. for a string S create Sr (which is S reversed) and then create a generalized suffix trie.
Then find the longest common sustring of S and Sr which is the path from the root to the deepest node that belongs both to S and Sr.
So the solution using the suffix tries approach essentially reduces to Find the longest common substring problem.
My question is the following:
If the input string is: S = "abacdfgdcaba" so , Sr = "abacdgfdcaba" the longest common substring is abacd which is NOT a palindrome.
So my question is: Is the approach of using suffix tries erroneous? Am i missunderstanding/misreading here?

解决方案

Yes, finding longest palindrome by using LCS like algorithms is not a good way, I didn't read referenced answer carefully but this line in the answer is completely wrong:

So the longest contained palindrome within a string is exactly the longest common substring of this string and its reverse

but if you read it and you have a counter example don't worry about it (you are right in 99%), this is common mistake, But simple way is as follow:

Write down the string (barbapapa) as follow: #b#a#r#b#a#p#a#p#a#, now traverse each character of this new string from left to right, check its left and right to check whether it's a palindrome center or not. This algorithm is O(n^2) in worst case and works perfectly correct. but normally will finds palindrome in O(n) (sure proving this in average case is hard). Worst case is in strings with too many long palindromes like aaaaaa...aaaa.

But there is better approach which takes O(n) time, base of this algorithm is by Manacher. Related algorithm is more complicated than what I saw in your referenced answer. But what I offered is base idea of Manacher algorithm, with clever changes in algorithm you can skip checking all left and rights (also there are algorithms by using suffix trees).


P.S: I couldn't see your Algo link because of my internet limitations, I don't know it's correct or not.

I added my discussion with OP to clarify the algorithm:

let test it with barbapapa-> #b#a#r#b#a#p#a#p#a#, start from first #
there is no left so it's center of palindrome with length 1.
Now "b",has # in left and # in right, but there isn't another left to match with right 
so it's a center of palindrome with length 3.
let skip other parts to arrive to first "p":
first left and right is # second left and right is "a", third left and
right is # but forth left and right are not equal so it's center of palindrome
of length 7 #a#p#a# is palindrome but b#a#p#a#p is not 
Now let see first "a" after first "p" you have, #a#p#a#p#a# as palindrome and this "a" 
is center of this palindrome with length 11 if you calculate all other palindromes 
length of all of them are smaller than 11

Also using # is because considering palindromes of even length.

After finding center of palindrome in newly created string, find related palindrom (by knowing the center and its length), then remove # to find out biggest palindrome.

这篇关于最长的回文子字符串和后缀特里的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆