查找所有出现的子字符串(包括重叠)? [英] Find all occurrences of a substring (including overlap)?
问题描述
好的,所以我找到了这个:如何找到所有出现的一个子串?
Okay, so I found this: How to find all occurrences of a substring?
也就是说,要获取列表中子字符串重叠出现的索引,您可以使用:
Which says, to get the indices overlapping occurances of substrings in a list, you can use:
[m.start() for m in re.finditer('(?=SUBSTRING)', 'STRING')]
哪个有效,但我的问题是要查找的字符串和子字符串都是由变量定义的.我对正则表达式的了解不够,不知道如何处理它 - 我可以让它与不重叠的子字符串一起工作,这只是:
Which works, but my problem is that both the string and the substring to look for are defined by variables. I don't know enough about regular expressions to know how to deal with it - I can get it to work with non-overlapping substrings, that's just:
[m.start() for m in re.finditer(p3, p1)]
因为有人问,所以我会继续说明.p1 和 p3 可以是任何字符串,但如果它们是,例如 p3 = "tryt"
和 p1 = "trytryt"
,结果应该是 [0, 3]
.
Because someone asked, I'll go ahead and specfify. p1 and p3 could be any string, but if they were, for example p3 = "tryt"
and p1 = "trytryt"
, the result should be [0, 3]
.
推荐答案
re.finditer
是简单的字符串.如果变量中有子字符串,只需将其格式化为正则表达式即可.像 '(?={0})'.format(p3)
这样的东西是一个开始.由于 各种符号在 RE 中确实具有特殊含义,您会想逃避他们.幸运的是 re
模块 包括 re.escape
正是为了满足这种需求.>
The arguments to re.finditer
are simple strings. If you have the substring in a variable simply format it into the regular expression. Something like '(?={0})'.format(p3)
is a start. Since various symbols do have special meaning in a RE you will want to escape them. Luckily the re
module includes re.escape
for just such a need.
[m.start() for m in re.finditer('(?={0})'.format(re.escape(p3)), p1)]
这篇关于查找所有出现的子字符串(包括重叠)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!