正则表达式C#可选组-应该贪婪吗? [英] regex c# optional group - should act greedy?

查看:91
本文介绍了正则表达式C#可选组-应该贪婪吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

具有正则表达式〜

blablabla.+?(?:<a href="(http://.+?)" target="_blank">)?

如果要找到一个URL,我想捕获一个URL ...找到了东西,但没有得到链接(捕获始终为空).现在,如果我这样删除结尾的问号

I want to capture an url if I find one... finds stuff but I don't get the link (capture is always empty). Now if I remove the question mark at the end like this

blablabla.+?(?:<a href="(http://.+?)" target="_blank">)

这只会匹配结尾处有链接的内容...现在是2.40 ...我不知道...

This will only match stuff that has the link at the end... it's 2.40 am... and I've got no ideas...

-编辑-

样本输入:

blablabla asd 1234t535 <a href="http://google.com" target="_blank">

预期输出:

match 0:

    group 1: <a href="http://google.com" target="_blank">
    group 2: http://google.com`

我只想要"http://google.com"或"

I just want "http://google.com" or ""

推荐答案

是结尾吗?原因:通过将其标记为可选,您就允许使用.+吗?抓住它.

It's the trailing ? that's doing you in. Reason: By marking it as optional, you're allowing the .+? to grab it.

blablabla.*(?:<a href="((http://)?.*)".+target="_blank".*>)

我稍作修改... .+?.*基本上相同,并且如果您的href中没有任何内容(您表示想要的是"),则需要将http以及尾随文本.另外,target前面的.*表示您至少有一个空格或字符,但可能有更多(多个空格或其他属性). .*之前的.*表示您可以在其后跟随空白或其他属性.

I modified it slightly... .+? is basically the same as .*, and if you may have nothing in your href (you indicated you wanted ""), you need to make the http optional as well as the trailing text. Also, .* in front target means you have at least one space or character, but may have more (multiple blanks or other attributes). .* before the > means you can have blanks or other attributes trailing after.

如果没有<a href...>,这根本不会与一行匹配,但这就是您想要的,对吗?

This will not match a line at all if there's no <a href...>, but that's what you want, right?

如果您不需要捕获整个<a href...>部分,则可以完全删除(?: ... ).

The (?: ... ) can be dropped completely, if you don't need to capture the whole <a href...> portion.

如果未按指定顺序列出属性,则此操作将失败...这是不能真正使用regex解析html的原因之一.但是,如果您确定href总是会出现在目标之前,那么这应该可以满足您的需求.

This will fail if the attributes are not listed in the order specified... which is one of the reasons regex can't really be used to parse html. But if you're certain the href will always come before the target, this should do what you need.

这篇关于正则表达式C#可选组-应该贪婪吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆