最长重复子串 [英] Longest repeated substring

查看:43
本文介绍了最长重复子串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我去上学的同时学习 Python.本质上,我需要在字符串列表中找到最长的重复子字符串,如这篇文章,并对我应该做的事情有所了解.

learning python as I go for school work. Essentially, I need to find the longest repeated substring in a list of string as shown here. I have been reading through this article and have an understanding of what I should do.

到目前为止,我的实现如下:

so far my implementation is as follows:

def long_rptr_subString(testList):
    longSubstring = ''

    if len(testList) > 1 and len(testList[0]) > 0:
        for i in range(len(testList[0])):
            for j in range(len(testList[0])-i+1):
                if j > len(longSubstring) and all(testList[0][i:i+j] in x for x in testList):
                    longSubstring = testList[0][i:i+j]
    return longSubstring

现在当我用 ['slide','glidb','flidt','cridz','bidr'] 调用函数时作为我最长的子字符串,我得到了'id'的正确结果.

now when I call my function with let's say ['slide', 'glidb', 'flidt', 'cridz', 'bidr'] I get the correct result of 'id' as being my longest substring.

但是,当我通过列表 ['slide','glidb','flidt','cridz','bidr','balh','tejka','djakljskdl','blah','等等',我没有得到任何返回结果.我应该取回'blah'作为我最长的子字符串,但是我还没有找到实现此目的的方法.看来我只能在列表的所有项目中都匹配子字符串.如何修改代码/逻辑以获取出现多次的最长子字符串?

However, when I pass the list ['slide', 'glidb', 'flidt', 'cridz', 'bidr', 'balh', 'tejka', 'djakljskdl', 'blah', 'blah', 'blah'] I don't get any return results. I should be getting back 'blah' as my longest substring, but I haven't figured out a way to accomplish this. It seems I am only able to match substrings when they are in ALL items of the list. How might I modify my code/logic to get the longest substring that occurs more than once?

谢谢.

推荐答案

您只能匹配 all 项目中的子字符串,因为这正是您要的:

You can only match substrings in all items because that's exactly what you ask for:

all(testList[0][i:i+j] in x for x in testList)

即使您进行了更改,也只能找到 first 子字符串中最长的子字符串,因为您只能通过 testlist [0] 进行检查.

Even if you change that, you can only find the longest substring that is in the first substring, because you only check through testlist[0] .

相反,请尝试以下操作:

Instead, try something like:

def longest_substr(lst):
    longest = None
    for word in lst:
        for i in range(len(word)):
            for j in range(i+1, len(word)+1):
                if ((longest is None or (j - i > len(longest))) and
                    sum(word[i:j] in w for w in lst) > 1):
                    longest = word[i:j]
    return longest

这会找到至少两个(> 1 )个单词中最长的子字符串(或者对于空列表或空字符串列表,不返回)并给出以下结果:

This finds the longest substring that's in at least two (> 1) of the words (or will return None for an empty list or list of empty strings) and gives me the following results:

>>> longest_substr(['slide', 'glidb', 'flidt', 'cridz', 'bidr'])
'lid'
>>> longest_substr(['slide', 'glidb', 'flidt', 'cridz', 'bidr', 'balh', 'tejka', 'djakljskdl', 'blah', 'blah', 'blah'])
'blah'

请注意,一旦删除了必须在所有字符串中包含子字符串的要求,则第一个单词列表中最长的实际上就是'lid'.

Note that, once you remove the requirement that the substring must be in all strings, the longest in your first list of words is actually 'lid'.

这篇关于最长重复子串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆