我如何正则表达式匹配具有未知组数的分组 [英] How do I regex match with grouping with unknown number of groups

查看:45
本文介绍了我如何正则表达式匹配具有未知组数的分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对程序的输出日志进行正则表达式匹配(在 Python 中).日志包含一些看起来像这样的行:

I want to do a regex match (in Python) on the output log of a program. The log contains some lines that look like this:

... 
VALUE 100 234 568 9233 119
... 
VALUE 101 124 9223 4329 1559
...

我想捕获在以 VALUE 开头的行的第一次出现之后出现的数字列表.即,我希望它返回 ('100','234','568','9233','119').问题是我事先不知道会有多少个数字.

I would like to capture the list of numbers that occurs after the first incidence of the line that starts with VALUE. i.e., I want it to return ('100','234','568','9233','119'). The problem is that I do not know in advance how many numbers there will be.

我尝试将其用作正则表达式:

I tried to use this as a regex:

VALUE (?:(\d+)\s)+

这与该行匹配,但它只捕获最后一个值,所以我只得到 ('119',).

This matches the line, but it only captures the last value, so I just get ('119',).

推荐答案

您正在寻找的是解析器,而不是正则表达式匹配.在你的情况下,我会考虑使用一个非常简单的解析器,split():

What you're looking for is a parser, instead of a regular expression match. In your case, I would consider using a very simple parser, split():

s = "VALUE 100 234 568 9233 119"
a = s.split()
if a[0] == "VALUE":
    print [int(x) for x in a[1:]]

您可以使用正则表达式来查看您的输入行是否符合您的预期格式(使用您问题中的正则表达式),然后您可以运行上述代码而无需检查 "VALUE"并且知道 int(x) 转换将始终成功,因为您已经确认以下字符组都是数字.

You can use a regular expression to see whether your input line matches your expected format (using the regex in your question), then you can run the above code without having to check for "VALUE" and knowing that the int(x) conversion will always succeed since you've already confirmed that the following character groups are all digits.

这篇关于我如何正则表达式匹配具有未知组数的分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆