python - 正则表达式搜索和 findall [英] python - regex search and findall

查看:65
本文介绍了python - 正则表达式搜索和 findall的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在给定的正则表达式的字符串中找到所有匹配项.我一直在使用 findall() 来做到这一点,直到我遇到了一个没有按我预期做的情况.例如:

I need to find all matches in a string for a given regex. I've been using findall() to do that until I came across a case where it wasn't doing what I expected. For example:

regex = re.compile('(\d+,?)+')
s = 'There are 9,000,000 bicycles in Beijing.'

print re.search(regex, s).group(0)
> 9,000,000

print re.findall(regex, s)
> ['000']

在这种情况下 search() 返回什么我需要(最长的匹配)但 findall() 表现不同,虽然文档暗示它应该是相同的:

In this case search() returns what I need (the longest match) but findall() behaves differently, although the docs imply it should be the same:

findall() 匹配所有出现的模式,而不仅仅是第一个正如 search() 所做的那样.

findall() matches all occurrences of a pattern, not just the first one as search() does.

  • 为什么行为不同?

    • Why is the behaviour different?

      我怎样才能用findall()(或其他)获得search()的结果?

      How can I achieve the result of search() with findall() (or something else)?

      推荐答案

      好的,我知道发生了什么......来自文档:

      Ok, I see what's going on... from the docs:

      如果模式中存在一个或多个组,则返回组列表;如果模式有多个组,这将是一个元组列表.

      If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

      事实证明,您确实有一个组,(\d+,?)"...因此,它返回的是该组的最后一次出现,即 000.

      As it turns out, you do have a group, "(\d+,?)"... so, what it's returning is the last occurrence of this group, or 000.

      一种解决方案是用一组包围整个正则表达式,就像这样

      One solution is to surround the entire regex by a group, like this

      regex = re.compile('((\d+,?)+)')
      

      然后,它将返回 [('9,000,000', '000')],这是一个包含两个匹配组的元组.当然,你只关心第一个.

      then, it will return [('9,000,000', '000')], which is a tuple containing both matched groups. of course, you only care about the first one.

      就个人而言,我会使用以下正则表达式

      Personally, i would use the following regex

      regex = re.compile('((\d+,)*\d+)')
      

      为了避免匹配诸如这是一个错误的数字 9,123"之类的内容,

      to avoid matching stuff like " this is a bad number 9,123,"

      编辑.

      这是一种避免必须用括号包围表达式或处理元组的方法

      Here's a way to avoid having to surround the expression by parenthesis or deal with tuples

      s = "..."
      regex = re.compile('(\d+,?)+')
      it = re.finditer(regex, s)
      
      for match in it:
        print match.group(0)
      

      finditer 返回一个迭代器,您可以使用它来访问找到的所有匹配项.这些匹配对象与 re.search 返回的对象相同,因此 group(0) 返回您期望的结果.

      finditer returns an iterator that you can use to access all the matches found. these match objects are the same that re.search returns, so group(0) returns the result you expect.

      这篇关于python - 正则表达式搜索和 findall的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆