在 Python 中找出字符串中正则表达式匹配的次数 [英] Find out how many times a regex matches in a string in Python

查看:171
本文介绍了在 Python 中找出字符串中正则表达式匹配的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有办法可以找出 Python 字符串中正则表达式的匹配项数?例如,如果我有字符串 它实际上是在它不按顺序行动时发生的."

Is there a way that I can find out how many matches of a regex are in a string in Python? For example, if I have the string "It actually happened when it acted out of turn."

我想知道 "t a" 在字符串中出现了多少次.在该字符串中,"t a" 出现了两次.我希望我的函数告诉我它出现了两次.这可能吗?

I want to know how many times "t a" appears in the string. In that string, "t a" appears twice. I want my function to tell me it appeared twice. Is this possible?

推荐答案

基于 findall 的现有解决方案适用于非重叠匹配(无疑是最佳的,除非可能匹配数量巨大),尽管诸如 sum(1 for m in re.finditer(thepattern, thestring)) 之类的替代方案(以避免在您只关心计数时实现列表)也是很有可能的.使用 subn 并忽略结果字符串会有些特殊……:

The existing solutions based on findall are fine for non-overlapping matches (and no doubt optimal except maybe for HUGE number of matches), although alternatives such as sum(1 for m in re.finditer(thepattern, thestring)) (to avoid ever materializing the list when all you care about is the count) are also quite possible. Somewhat idiosyncratic would be using subn and ignoring the resulting string...:

def countnonoverlappingrematches(pattern, thestring):
  return re.subn(pattern, '', thestring)[1]

如果您只关心(比如说)最多 100 个匹配项,则后一种想法的唯一真正优势就会出现;那么,re.subn(pattern, '', thestring, 100)[1] 可能是实用的(返回 100,无论有 100 个匹配项,还是 1000,甚至更大的数字).

the only real advantage of this latter idea would come if you only cared to count (say) up to 100 matches; then, re.subn(pattern, '', thestring, 100)[1] might be practical (returning 100 whether there are 100 matches, or 1000, or even larger numbers).

计数重叠匹配需要您编写更多代码,因为有问题的内置函数都专注于非重叠匹配.还有一个定义问题,例如,模式是 'a+' 而字符串是 'aa',你会认为这只是一个匹配,还是三个(第一个 a,第二个,两者都是),还是...?

Counting overlapping matches requires you to write more code, because the built-in functions in question are all focused on NON-overlapping matches. There's also a problem of definition, e.g, with pattern being 'a+' and thestring being 'aa', would you consider this to be just one match, or three (the first a, the second one, both of them), or...?

例如,假设您想要可能重叠的匹配从字符串中的不同位置开始(然后会为上一段中的示例提供两个匹配):

Assuming for example that you want possibly-overlapping matches starting at distinct spots in the string (which then would give TWO matches for the example in the previous paragraph):

def countoverlappingdistinct(pattern, thestring):
  total = 0
  start = 0
  there = re.compile(pattern)
  while True:
    mo = there.search(thestring, start)
    if mo is None: return total
    total += 1
    start = 1 + mo.start()

请注意,在这种情况下,您必须将模式编译为 RE 对象:函数 re.search 不接受 start 参数(搜索的起始位置)) method search 的方式,所以你必须边走边切字符串——绝对比让下一个搜索从下一个可能的不同位置开始更努力起点,这就是我在这个函数中所做的.

Note that you do have to compile the pattern into a RE object in this case: function re.search does not accept a start argument (starting position for the search) the way method search does, so you'd have to be slicing thestring as you go -- definitely more effort than just having the next search start at the next possible distinct starting point, which is what I'm doing in this function.

这篇关于在 Python 中找出字符串中正则表达式匹配的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆