来自两个以上字符串的最长公共子字符串 [英] Longest common substring from more than two strings
问题描述
我正在寻找一个 Python 库,用于从一组字符串中找到最长的公共子字符串.有两种方法可以解决这个问题:
I'm looking for a Python library for finding the longest common sub-string from a set of strings. There are two ways to solve this problem:
- 使用后缀树
- 使用动态规划.
实现的方法并不重要.重要的是它可以用于一组字符串(不仅仅是两个字符串).
Method implemented is not important. It is important it can be used for a set of strings (not only two strings).
推荐答案
这些成对的函数将在任意字符串数组中找到最长的公共字符串:
These paired functions will find the longest common string in any arbitrary array of strings:
def long_substr(data):
substr = ''
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(data[0])-i+1):
if j > len(substr) and is_substr(data[0][i:i+j], data):
substr = data[0][i:i+j]
return substr
def is_substr(find, data):
if len(data) < 1 and len(find) < 1:
return False
for i in range(len(data)):
if find not in data[i]:
return False
return True
print long_substr(['Oh, hello, my friend.',
'I prefer Jelly Belly beans.',
'When hell freezes over!'])
毫无疑问,算法可以改进,而且我对 Python 的接触不多,所以也许它在语法上也可以更有效,但它应该可以完成工作.
No doubt the algorithm could be improved and I've not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job.
编辑:内联第二个 is_substr 函数,如 J.F. Sebastian 所示.用法保持不变.注意:算法没有变化.
in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm.
def long_substr(data):
substr = ''
if len(data) > 1 and len(data[0]) > 0:
for i in range(len(data[0])):
for j in range(len(data[0])-i+1):
if j > len(substr) and all(data[0][i:i+j] in x for x in data):
substr = data[0][i:i+j]
return substr
希望这会有所帮助,
杰森.
这篇关于来自两个以上字符串的最长公共子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!