检测字符串中的重复 [英] Detect repetitions in string

查看:51
本文介绍了检测字符串中的重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的问题,但找不到简单的解决方案 :)

假设我有一个字符串.我想检测其中是否有重复.

我愿意:

"blablabla" # =>(布拉,3)拉布拉布拉"# =>(布拉,2)

问题是我不知道我在搜索什么模式(我没有bla"作为输入).

有什么想法吗?


看到评论,我想我应该更精确一点我的想法:

  • 在一个字符串中,要么存在重复的模式,要么不重复.
  • 重复模式可以是任意长度.

如果有一个模式,它会一遍又一遍地重复直到结束.但是字符串可以在模式的中间结束.

示例:

"testblblblbb" # =>("bl",4)

解决方案

import re定义重复次数:r = re.compile(r"(.+?)\1+")对于 r.finditer(s) 中的匹配:产量 (match.group(1), len(match.group(0))/len(match.group(1)))

使用最短的重复单位查找所有不重叠的重复匹配项:

<预><代码>>>>列表(重复(blablabla"))[('bla', 3)]>>>列表(重复(rablabla"))[('abl', 2)]>>>列表(重复(aaaaa"))[('a', 5)]>>>列表(重复(aaaaablablabla"))[('a', 5), ('bla', 3)]

I have a simple problem, but can't come with a simple solution :)

Let's say I have a string. I want to detect if there is a repetition in it.

I'd like:

"blablabla" # => (bla, 3)

"rablabla"  # => (bla, 2)

The thing is I don't know what pattern I am searching for (I don't have "bla" as input).

Any idea?

EDIT:
Seeing the comments, I think I should precise a bit more what I have in mind:

  • In a string, there is either a pattern that is repeted or not.
  • The repeted pattern can be of any length.

If there is a pattern, it would be repeted over and over again until the end. But the string can end in the middle of the pattern.

Example:

"testblblblblb" # => ("bl",4) 

解决方案

import re
def repetitions(s):
   r = re.compile(r"(.+?)\1+")
   for match in r.finditer(s):
       yield (match.group(1), len(match.group(0))/len(match.group(1)))

finds all non-overlapping repeating matches, using the shortest possible unit of repetition:

>>> list(repetitions("blablabla"))
[('bla', 3)]
>>> list(repetitions("rablabla"))
[('abl', 2)]
>>> list(repetitions("aaaaa"))
[('a', 5)]
>>> list(repetitions("aaaaablablabla"))
[('a', 5), ('bla', 3)]

这篇关于检测字符串中的重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆