正则表达式不够贪婪 [英] Regex not being greedy enough

查看：124 发布时间：2020/4/27 4:06:03 regex language-agnostic regex-greedy

本文介绍了正则表达式不够贪婪的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有下面的正则表达式可以正常工作，直到出现新情况

^.*[?&]U(?:RL)?=(?<URL>.*)$

基本上，它用于URL，以获取U =或URL =之后的所有内容，并在URL匹配中将其返回

因此，对于以下内容

http://localhost?a = b& u = http ://otherhost?foo = bar

URL = http://otherhost?foo = bar

不幸的是，出现了一个奇怪的案件

http://localhost ?a = b& u = http://otherhost?foo = bar& url = http://someotherhost

理想情况下，我希望URL为" http://otherhost?foo = bar& ; url = http://someotherhost "，相反，它只是" http://someotherhost "

我认为这可以解决它……虽然不是很漂亮

^.*[?&](?<![?&]U(?:RL)?=.*)U(?:RL)?=(?<URL>.*)$

解决方案

问题

问题不在于.*不够贪婪.是因为先前出现的 other .*也是也是贪婪.

为说明此问题，让我们考虑一个不同的示例.考虑以下两种模式；它们是相同的，除了在第二种模式中不愿意\1:

              \1 greedy, \2 greedy         \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$           ^([0-5]*?)([5-9]*)$

在这里，我们有两个捕获组. \1捕获[0-5]*，而\2捕获[5-9]*.这是这些模式匹配和捕获的并排比较:

              \1 greedy, \2 greedy          \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$            ^([0-5]*?)([5-9]*)$
Input         Group 1    Group 2            Group 1    Group 2
54321098765   543210     98765              543210     98765
007           00         7                  00         7
0123456789    012345     6789               01234      56789
0506          050        6                  050        6
555           555        <empty>            <empty>    555
5550555       5550555    <empty>            5550       555

请注意，与\2一样贪婪，它只能抓住\1尚未抢先的东西！因此，如果要使\2尽可能多地抓住5，则必须使\1不愿意，因此5实际上是由\2抓住的.

附件

修复

因此，将其应用于您的问题，有两种方法可以解决此问题:您可以使第一个.*不愿意，因此(请参见rubular.com ):

[?&]U(?:RL)?=(?<URL>.*)$

I've got the following regex that was working perfectly until a new situation arose

^.*[?&]U(?:RL)?=(?<URL>.*)$

Basically, it's used against URLs, to grab EVERYTHING after the U=, or URL= and return it in the URL match

So, for the following

http://localhost?a=b&u=http://otherhost?foo=bar

URL = http://otherhost?foo=bar

Unfortunately an odd case came up

http://localhost?a=b&u=http://otherhost?foo=bar&url=http://someotherhost

Ideally, I want URL to be "http://otherhost?foo=bar&url=http://someotherhost", instead, it is just "http://someotherhost"

EDIT: I think this fixed it...though it's not pretty

^.*[?&](?<![?&]U(?:RL)?=.*)U(?:RL)?=(?<URL>.*)$

解决方案

The issue

The problem is not that .* is not being greedy enough; it's that the other .* that appears earlier is also greedy.

To illustrate the issue, let's consider a different example. Consider the following two patterns; they're identical, except in reluctance of \1 in second pattern:

              \1 greedy, \2 greedy         \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$           ^([0-5]*?)([5-9]*)$

Here we have two capturing groups. \1 captures [0-5]*, and \2 captures [5-9]*. Here's a side-by-side comparison of what these patterns match and capture:

              \1 greedy, \2 greedy          \1 reluctant, \2 greedy
              ^([0-5]*)([5-9]*)$            ^([0-5]*?)([5-9]*)$
Input         Group 1    Group 2            Group 1    Group 2
54321098765   543210     98765              543210     98765
007           00         7                  00         7
0123456789    012345     6789               01234      56789
0506          050        6                  050        6
555           555        <empty>            <empty>    555
5550555       5550555    <empty>            5550       555

Note that as greedy as \2 is, it can only grab what \1 didn't already grab first! Thus, if you want to make \2 grab as many 5 as possible, you have to make \1 reluctant, so the 5 is actually up for grab by \2.

Attachments

The fix

So applying this to your problem, there are two ways that you can fix this: you can make the first .* reluctant, so (see on rubular.com):

^.*?[?&]U(?:RL)?=(?<URL>.*)$

Alternatively you can just get rid of the prefix matching part altogether (see on rubular.com):

[?&]U(?:RL)?=(?<URL>.*)$

这篇关于正则表达式不够贪婪的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式不够贪婪 [英] Regex not being greedy enough

问题描述

问题

附件

相关问题

修复

The issue

Attachments

Related questions

The fix

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

正则表达式不够贪婪 [英] Regex not being greedy enough

问题描述

问题

附件

相关问题

修复

The issue

Attachments

Related questions

The fix

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭