可选捕获组未捕获 [英] Optional Capture Group Not Capturing

查看:75
本文介绍了可选捕获组未捕获的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<tr><td align=right>Name:</td><td align=left><b><font color=black>Nathan</font></b></td></tr>
<tr><td align=right>Extension:</td><td align=left><b>222</b></td></tr>

我有上面的 HTML 文本(不能更改),我想要一个返回3个捕获组的正则表达式,标签为(Name|Extension)字体颜色为(black|red)以及数据(\w+).

I have the above HTML glob of text (can't be changed) and I'd like a regular expression that returns 3 capturing groups, the label (Name|Extension) the font color (black|red) and the data (\w+).

我在返回捕获组2(字体颜色)时遇到了一些麻烦.如您所见,它不在表的扩展"行上,因此我将捕获组设为可选.当我这样做时,第一行根本不匹配.我尝试了很多量词的不同组合的反复试验,但仍然无法获得所需的结果.

I'm having some trouble returning capture group 2, the font color. As you can see, it's not present on the "Extension" row of the table, so I've made the capture group optional. When I do that, it's not matching at all on the first row. I've tried fiddling a lot with trial and error of a bunch of different combinations of quantifiers, but I still can't get the result I'm looking for.

这是我到目前为止的模式:(Name|Extension):.*?(?:<font color=(black|red)>)?.*?>(\w+)

Here's the pattern I have so far: (Name|Extension):.*?(?:<font color=(black|red)>)?.*?>(\w+)

我相信.*?正在消耗可选的捕获组,并且仅与第一和第三组匹配.如果有人可以向我解释我哪里出错了,那太好了.

I believe the .*? is consuming what would be the optional capture group and only matching the 1st and 3rd group. If someone could explain to me where I've gone wrong, that would be great.

作为一个尝试学习更多有关正则表达式的人,如果人们可以将我上面的数据解释为不可变的文本而不是HTML,我将不胜感激.

As someone who is trying to learn more about regular expressions, I would appreciate it if people could interpret the data I have above as immutable text rather than HTML.

推荐答案

问题是勉强的量词.首先,第一个.*?不消耗任何内容,从而使正则表达式的下一部分可以尝试在:之后立即匹配FONT标记.它找不到一个,但是没关系,因为该部分是可选的.然后第二个.*?接管,仅消耗必需的电量,直到>(\w+)可以匹配为止.因此,如果有 一个FONT标签,它将被第二个.*?匹配,而不是您期望的可选组.

The problem is the reluctant quantifiers. The first .*? consumes nothing at first, allowing the next part of the regex to try matching the FONT tag right after the :. It doesn't find one, but that's okay because that part's optional. Then the second .*? takes over, consuming only as much as it has to until the >(\w+) can match. So if there is a FONT tag, it's getting matched by the second .*?, not by the optional group as you intended.

但是不要打扰量词的贪婪; 可能起作用,但更有可能失败效率较低.尝试以下方法:

But don't bother making the quantifiers greedy; it might work, but more likely it will just fail less efficiently. Try this instead:

<td[^>]*>(Name|Extension):</td><td[^>]*><b>(?:<font color=(black|red)>)?([^<]*)<

因为我明确匹配了标签之后的所有标签,所以如果有标签,它处于正确的位置以匹配FONT标签.如果存在,group(2)将包含颜色;否则为null.

Because I explicitly matched all the tags following the label, it's in the correct position to match the FONT tag if there is one. If it's there, group(2) will contain the color; otherwise it will be null.

这篇关于可选捕获组未捕获的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆