正则表达式贪婪仅在左侧(.net) [英] regular expression greedy on left side only (.net)

查看:268
本文介绍了正则表达式贪婪仅在左侧(.net)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试捕获两个字符串之间的匹配项.

I am trying to capture matches between two strings.

例如,我正在寻找使用最快"匹配(不继续向外扩展)出现在Q和XYZ之间的所有文本.该字符串:

For example, I am looking for all text that appears between Q and XYZ, using the "soonest" match (not continuing to expand outwards). This string:

马戏团Q你好, Q SOMETEXT XYZ 今天是XYZ好的一天XYZ

circus Q hello there Q SOMETEXT XYZ today is the day XYZ okay XYZ

应返回:

Q SOMETEXT XYZ

Q SOMETEXT XYZ

但是,它返回:

Q你好,Q SOMETEXT XYZ

Q hello there Q SOMETEXT XYZ

这是我正在使用的表达式: Q.*?XYZ

Here is the expression I'm using: Q.*?XYZ

它向左退得太远了.当我在星号后使用问号时,它在行驶方面运行良好.我该如何在左侧执行相同操作,并在我打到第一个左侧Q时停下来,使其工作原理与右侧相同?我已经尝试从 http://msdn.microsoft.com上问号和其他符号/en-us/library/az24scfc.aspx ,但是有些事情我只是想不通.

It's going too far back to the left. It's working fine on the ride side when I use the question mark after the asterisk. How can I do the same for the left side, and stop once I hit that first left Q, making it work the same as the right side works? I've tried question marks and other symbols from http://msdn.microsoft.com/en-us/library/az24scfc.aspx, but there's something I'm just not figuring out.

我是regex的新手,因此对此提供的任何帮助将不胜感激!

I'm a regex novice, so any help on this would be appreciated!

推荐答案

好吧,非贪婪匹配正在起作用-它获得满足正则表达式的最短字符串.您需要记住的是 regex是一个从左到右的过程.因此它与第一个Q匹配,然后得到最短的字符数,后跟一个XYZ.如果希望它不超过任何Q,则必须使用否定的字符类:

Well, the non Greedy match is working - it gets the shortest string that satisfies the regex. The thing that you have to remember is that regex is a left to right process. So it matches the first Q, then gets the shortest number of characters followed by an XYZ. If you want it not to go past any Qs, you have to use a negated character class:

Q[^Q]*?XYZ

[^ Q]匹配不是Q的任何一个字符.请注意,这仅适用于单个字符.如果开头的距离是多个字符,则必须以其他方式进行.为什么?好吧,使用定界符"PQR",字符串为

[^Q] matches any one character that is not a Q. Mind that this will only work for a single character. If your opening delimeter is multiple characters, you have to do it a different way. Why? Well, take the delimiter 'PQR' and the string is

foo PQR bar XYZ 

如果您以前尝试使用正则表达式,但是将字符类扩展到了:

If you try to use the regex from before, but you extended the character class to :

PQR[^PQR]*?XYZ

那么您将获得

'PQR bar XYZ'

如您所料.但是,如果您的字符串是

As you expected. But if your string is

foo PQR Party Time! XYZ 

您将不会找到任何匹配.这是因为[]描绘了一个字符类"-恰好匹配一个字符.使用这些类,只需列出它们就可以匹配一系列字符.

You'll get no matches. It's because [] delineates a "character class" - which matches exactly one character. Using these classes, you can match a range of characters, simply by listing them.

th[ae]n

将匹配"than"和"then",但不匹配"thin".在开头放置一个克拉('^')会否定该类-意思是匹配这些字符以外的任何字符"-因此,通过将我们的单字符定界符变为[^ PQR],而不是说不是'PQR'",您可以再说不是'P','Q'或'R'".您仍然可以根据需要使用此功能,但前提是您必须100%确保定界符中的字符仅在定界符中.如果是这样,使用贪婪匹配并且只取反定界符的第一个字符会更快.正则表达式为:

will match both 'than' and 'then', but not 'thin'. Placing a carat ('^') at the beginning negates the class - meaning "match anything but these characters" - so by turning our one-character delimiter into [^PQR], rather than saying "not 'PQR'", you're saying "not 'P', 'Q', or 'R'". You can still use this if you want, but only if you're 100% sure that the characters from your delimiter will only be in your delimiter. If that's the case, it's faster to use greedy matching and only negate the first character of your delimiter. The regex for that would be:

PQR[^P]*XYZ 

但是,如果您不能保证这一点,请匹配:

But, if you can't make that guarantee, then match with:

PQR(?:.(?!PQR))*?XYZ

正则表达式不直接支持负字符串匹配(因为考虑时无法定义),因此必须使用负前瞻.

Regex doesn't directly support negative string matching (because it's impossible to define, when you think about it), so you have to use a negative lookahead.

(?!PQR)

就是这样的前瞻.它的意思是声明接下来的几个字符不是此内部正则表达式",而不匹配任何字符,因此

is just such a lookahead. It means "Assert that the next few characters are not this internal regex", without matching any characters, so

.(?!PQR)

匹配任何不带PQR的字符.将其分组,以便您可以懒洋洋地重复它,

matches any character not followed by PQR. Wrap that in a group so that you can lazily repeat it,

(.(?!PQR))*?

,并且您有一个匹配项不包含我的定界符的字符串".我唯一要做的就是添加一个?:使其成为非捕获组.

and you have a match for "string that doesn't contain my delimiter". The only thing I did was add a ?: to make it a non-capturing group.

(?:.(?!PQR))*?

根据用于解析正则表达式的语言,它可能会尝试分别传回每个匹配的组(用于查找和替换).这样可以避免这样做.

Depending on the language you use to parse your regex, it may try to pass back every matched group individually (useful for find and replace). This keeps it from doing that.

祝您重新注册愉快!

这篇关于正则表达式贪婪仅在左侧(.net)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆