捕获 Python 正则表达式中的重复子模式 [英] Capturing repeating subpatterns in Python regex

查看:30
本文介绍了捕获 Python 正则表达式中的重复子模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在匹配电子邮件地址时,在我匹配诸如 yasar@webmail 之类的内容后,我想捕获一个或多个 (\.\w+)(我是什么这样做有点复杂,这只是一个例子),我尝试添加 (.\w+)+ ,但它只捕获最后一场比赛.例如,yasar@webmail.something.edu.tr 匹配但只包含 .tr 之后 yasar@webmail 部分,所以我丢失了 .something.edu 组.我可以在 Python 正则表达式中执行此操作,还是建议先匹配所有内容,然后再拆分子模式?

解决方案

re 模块不支持重复捕获 (regex 支持):

<预><代码>>>>m = regex.match(r'([.\w]+)@((\w+)(\.\w+)+)', 'yasar@webmail.something.edu.tr')>>>m.groups()('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')>>>m.captures(4)['.something', '.edu', '.tr']

在您的情况下,我稍后会拆分重复的子模式.它会生成一个简单易读的代码,例如,请参阅 @Li-aung Yip 的回答中的代码.

While matching an email address, after I match something like yasar@webmail, I want to capture one or more of (\.\w+)(what I am doing is a little bit more complicated, this is just an example), I tried adding (.\w+)+ , but it only captures last match. For example, yasar@webmail.something.edu.tr matches but only include .tr after yasar@webmail part, so I lost .something and .edu groups. Can I do this in Python regular expressions, or would you suggest matching everything at first, and split the subpatterns later?

解决方案

re module doesn't support repeated captures (regex supports it):

>>> m = regex.match(r'([.\w]+)@((\w+)(\.\w+)+)', 'yasar@webmail.something.edu.tr')
>>> m.groups()
('yasar', 'webmail.something.edu.tr', 'webmail', '.tr')
>>> m.captures(4)
['.something', '.edu', '.tr']

In your case I'd go with splitting the repeated subpatterns later. It leads to a simple and readable code e.g., see the code in @Li-aung Yip's answer.

这篇关于捕获 Python 正则表达式中的重复子模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆