如何判断字符串是否在 Python 中重复? [英] How can I tell if a string repeats itself in Python?

查看:129
本文介绍了如何判断字符串是否在 Python 中重复?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法来测试给定的字符串是否在整个字符串中重复.

示例:

<预><代码>['0045662100456621004566210045662100456621',#'00456621''0072992700729927007299270072992700729927',#'00729927''001443001443001443001443001443001443001443',#'001443''037037037037037037037037037037037037037037037',#'037''047619047619047619047619047619047619047619',#'047619''002457002457002457002457002457002457002457',#'002457''001221001221001221001221001221001221001221',#'001221''001230012300123001230012300123001230012300123',#'00123''0013947001394700139470013947001394700139470013947',#'0013947''001001001001001001001001001001001001001001001001001',#'001''001406469760900140646976090014064697609',#'0014064697609']

是重复自己的字符串,并且

<预><代码>['004608294930875576​​036866359447','00469483568075117370892018779342723','004739336492890995260663507109','001508295625942684766214177978883861236802413273','007518796992481203','0071942446043165467625899280575539568345323741','0434782608695652173913','0344827586206896551724137931','002481389578163771712158808933','002932551319648093841642228739','0035587188612099644128113879','003484320557491289198606271777','00115074798619102416570771',]

是没有的例子.

给定的字符串的重复部分可能很长,字符串本身可能有 500 个或更多字符,因此循环遍历每个字符以尝试构建模式,然后检查模式与字符串的其余部分似乎慢得可怕.乘以可能有数百个字符串,我看不到任何直观的解决方案.

我对正则表达式进行了一些研究,当您知道要查找的内容或至少知道要查找的模式的长度时,它们似乎很适合.不幸的是,我都不知道.

如何判断一个字符串是否在重复,如果是,最短的重复子序列是什么?

解决方案

这里有一个简洁的解决方案,可以避免正则表达式和缓慢的 Python 内循环:

def principal_period(s):i = (s+s).find(s, 1, -1)如果 i == -1 则返回 None 否则 s[:i]

请参阅@davidism 发起的社区维基答案,了解基准测试结果.总之,

<块引用>

David Zhang 的解决方案显然是赢家,对于大型示例集,其性能至少比其他所有解决方案高 5 倍.

(那个答案的话,不是我的.)

这是基于观察到一个字符串是周期性的,当且仅当它等于自身的非平凡旋转.感谢@AleksiTorhamo 意识到我们可以从 (s+s)[1:-1] 中第一次出现 s 的索引中恢复本金期,并通知我 Python 的 string.find 的可选 startend 参数.

I'm looking for a way to test whether or not a given string repeats itself for the entire string or not.

Examples:

[
    '0045662100456621004566210045662100456621',             # '00456621'
    '0072992700729927007299270072992700729927',             # '00729927'
    '001443001443001443001443001443001443001443',           # '001443'
    '037037037037037037037037037037037037037037037',        # '037'
    '047619047619047619047619047619047619047619',           # '047619'
    '002457002457002457002457002457002457002457',           # '002457'
    '001221001221001221001221001221001221001221',           # '001221'
    '001230012300123001230012300123001230012300123',        # '00123'
    '0013947001394700139470013947001394700139470013947',    # '0013947'
    '001001001001001001001001001001001001001001001001001',  # '001'
    '001406469760900140646976090014064697609',              # '0014064697609'
]

are strings which repeat themselves, and

[
    '004608294930875576036866359447',
    '00469483568075117370892018779342723',
    '004739336492890995260663507109',
    '001508295625942684766214177978883861236802413273',
    '007518796992481203',
    '0071942446043165467625899280575539568345323741',
    '0434782608695652173913',
    '0344827586206896551724137931',
    '002481389578163771712158808933',
    '002932551319648093841642228739',
    '0035587188612099644128113879',
    '003484320557491289198606271777',
    '00115074798619102416570771',
]

are examples of ones that do not.

The repeating sections of the strings I'm given can be quite long, and the strings themselves can be 500 or more characters, so looping through each character trying to build a pattern then checking the pattern vs the rest of the string seems awful slow. Multiply that by potentially hundreds of strings and I can't see any intuitive solution.

I've looked into regexes a bit and they seem good for when you know what you're looking for, or at least the length of the pattern you're looking for. Unfortunately, I know neither.

How can I tell if a string is repeating itself and if it is, what the shortest repeating subsequence is?

解决方案

Here's a concise solution which avoids regular expressions and slow in-Python loops:

def principal_period(s):
    i = (s+s).find(s, 1, -1)
    return None if i == -1 else s[:i]

See the Community Wiki answer started by @davidism for benchmark results. In summary,

David Zhang's solution is the clear winner, outperforming all others by at least 5x for the large example set.

(That answer's words, not mine.)

This is based on the observation that a string is periodic if and only if it is equal to a nontrivial rotation of itself. Kudos to @AleksiTorhamo for realizing that we can then recover the principal period from the index of the first occurrence of s in (s+s)[1:-1], and for informing me of the optional start and end arguments of Python's string.find.

这篇关于如何判断字符串是否在 Python 中重复?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆