正则表达式不够贪婪 [英] Regex not being greedy enough

查看:124
本文介绍了正则表达式不够贪婪的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下面的正则表达式可以正常工作,直到出现新情况

^.*[?&]U(?:RL)?=(?<URL>.*)$

基本上,它用于URL,以获取U =或URL =之后的所有内容,并在URL匹配中将其返回

因此,对于以下内容

http://localhost?a = b& u = http ://otherhost?foo = bar

URL = http://otherhost?foo = bar

不幸的是,出现了一个奇怪的案件

http://localhost ?a = b& u = http://otherhost?foo = bar& url = http://someotherhost

理想情况下,我希望URL为" http://otherhost?foo = bar& ; url = http://someotherhost ",相反,它只是" http://someotherhost "

我认为这可以解决它……虽然不是很漂亮

^.*[?&](?<![?&]U(?:RL)?=.*)U(?:RL)?=(?<URL>.*)$

解决方案

问题

问题不在于.*不够贪婪.是因为先前出现的 other .*也是也是贪婪.

为说明此问题,让我们考虑一个不同的示例.考虑以下两种模式;它们是相同的,除了在第二种模式中不愿意\1:

              \1 greedy, \2 greedy         \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$           ^([0-5]*?)([5-9]*)$

在这里,我们有两个捕获组. \1捕获[0-5]*,而\2捕获[5-9]*.这是这些模式匹配和捕获的并排比较:

              \1 greedy, \2 greedy          \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$            ^([0-5]*?)([5-9]*)$
Input         Group 1    Group 2            Group 1    Group 2
54321098765   543210     98765              543210     98765
007           00         7                  00         7
0123456789    012345     6789               01234      56789
0506          050        6                  050        6
555           555        <empty>            <empty>    555
5550555       5550555    <empty>            5550       555

请注意,与\2一样贪婪,它只能抓住\1尚未抢先的东西!因此,如果要使\2尽可能多地抓住5,则必须使\1不愿意,因此5实际上是由\2抓住的.

附件

相关问题


修复

因此,将其应用于您的问题,有两种方法可以解决此问题:您可以使第一个.*不愿意,因此(请参见rubular.com ):

[?&]U(?:RL)?=(?<URL>.*)$

I've got the following regex that was working perfectly until a new situation arose

^.*[?&]U(?:RL)?=(?<URL>.*)$

Basically, it's used against URLs, to grab EVERYTHING after the U=, or URL= and return it in the URL match

So, for the following

http://localhost?a=b&u=http://otherhost?foo=bar

URL = http://otherhost?foo=bar

Unfortunately an odd case came up

http://localhost?a=b&u=http://otherhost?foo=bar&url=http://someotherhost

Ideally, I want URL to be "http://otherhost?foo=bar&url=http://someotherhost", instead, it is just "http://someotherhost"

EDIT: I think this fixed it...though it's not pretty

^.*[?&](?<![?&]U(?:RL)?=.*)U(?:RL)?=(?<URL>.*)$

解决方案

The issue

The problem is not that .* is not being greedy enough; it's that the other .* that appears earlier is also greedy.

To illustrate the issue, let's consider a different example. Consider the following two patterns; they're identical, except in reluctance of \1 in second pattern:

              \1 greedy, \2 greedy         \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$           ^([0-5]*?)([5-9]*)$

Here we have two capturing groups. \1 captures [0-5]*, and \2 captures [5-9]*. Here's a side-by-side comparison of what these patterns match and capture:

              \1 greedy, \2 greedy          \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$            ^([0-5]*?)([5-9]*)$
Input         Group 1    Group 2            Group 1    Group 2
54321098765   543210     98765              543210     98765
007           00         7                  00         7
0123456789    012345     6789               01234      56789
0506          050        6                  050        6
555           555        <empty>            <empty>    555
5550555       5550555    <empty>            5550       555

Note that as greedy as \2 is, it can only grab what \1 didn't already grab first! Thus, if you want to make \2 grab as many 5 as possible, you have to make \1 reluctant, so the 5 is actually up for grab by \2.

Attachments

Related questions


The fix

So applying this to your problem, there are two ways that you can fix this: you can make the first .* reluctant, so (see on rubular.com):

^.*?[?&]U(?:RL)?=(?<URL>.*)$

Alternatively you can just get rid of the prefix matching part altogether (see on rubular.com):

[?&]U(?:RL)?=(?<URL>.*)$

这篇关于正则表达式不够贪婪的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆