找到所有正则表达式匹配的索引? [英] Find the indexes of all regex matches?

查看:61
本文介绍了找到所有正则表达式匹配的索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在解析的字符串中可能包含任意数量的带引号的字符串(我正在解析代码,并试图避免 PLY).我想知道子字符串是否被引用,并且我有子字符串索引.我最初的想法是使用 re 查找所有匹配项,然后找出它们所代表的索引范围.

似乎我应该将 re 与像 \"[^\"]+\"|'[^']+' 这样的正则表达式一起使用(我避免处理三引号等字符串).当我使用 findall() 时,我得到了一个匹配字符串的列表,这有点不错,但我需要索引.

我的子串可能和 c 一样简单,我需要弄清楚这个特定的 c 是否真的被引用了.

解决方案

这就是你想要的:(来源)

<块引用>

re.finditer(pattern, string[, flags])

返回一个迭代器,在所有情况下产生 MatchObject 实例字符串中 RE 模式的非重叠匹配.字符串是从左到右扫描,并按找到的顺序返回匹配项.空的匹配项包含在结果中,除非它们触及另一场比赛.

然后您可以从 MatchObjects 中获取开始和结束位置.

例如

[(m.start(0), m.end(0)) for m in re.finditer(pattern, string)]

I'm parsing strings that could have any number of quoted strings inside them (I'm parsing code, and trying to avoid PLY). I want to find out if a substring is quoted, and I have the substrings index. My initial thought was to use re to find all the matches and then figure out the range of indexes they represent.

It seems like I should use re with a regex like \"[^\"]+\"|'[^']+' (I'm avoiding dealing with triple quoted and such strings at the moment). When I use findall() I get a list of the matching strings, which is somewhat nice, but I need indexes.

My substring might be as simple as c, and I need to figure out if this particular c is actually quoted or not.

解决方案

This is what you want: (source)

re.finditer(pattern, string[, flags]) 

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.

You can then get the start and end positions from the MatchObjects.

e.g.

[(m.start(0), m.end(0)) for m in re.finditer(pattern, string)]

这篇关于找到所有正则表达式匹配的索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆