正则表达式直到但不包括 [英] Regex Until But Not Including

查看:35
本文介绍了正则表达式直到但不包括的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于正则表达式,搜索直到但不包括的语法是什么?有点像:

For regex what is the syntax for search until but not including? Kinda like:

Haystack:
The quick red fox jumped over the lazy brown dog

Expression:
.*?quick -> and then everything until it hits the letter "z" but do not include z

推荐答案

搜索直到 X 但不包括 X"的显式表达方式是:

The explicit way of saying "search until X but not including X" is:

(?:(?!X).)*

其中 X 可以是任何正则表达式.

where X can be any regular expression.

不过,在你的情况下,这可能是矫枉过正——这里最简单的方法是

In your case, though, this might be overkill - here the easiest way would be

[^z]*

这将匹配除 z 之外的任何内容,因此在下一个 z 之前停止.

This will match anything except z and therefore stop right before the next z.

所以.*?quick[^z]* 会匹配快狐跳过la.

但是,只要您需要注意不止一个简单的字母,例如 (?:(?!X).)* 就会发挥作用

However, as soon as you have more than one simple letter to look out for, (?:(?!X).)* comes into play, for example

(?:(?!lazy).)* - 匹配直到词 lazy 开始的任何内容.

(?:(?!lazy).)* - match anything until the start of the word lazy.

这是使用前瞻断言,更具体地说一个负面的前瞻.

This is using a lookahead assertion, more specifically a negative lookahead.

.*?quick(?:(?!lazy).)* 将匹配快狐狸跳过.

说明:

(?:        # Match the following but do not capture it:
 (?!lazy)  # (first assert that it's not possible to match "lazy" here
 .         # then match any character
)*         # end of group, zero or more repetitions.

此外,在搜索关键字时,您可能希望用词边界锚将它们包围:fox 将只匹配完整的词 fox 而不是 fox在 foxy 中.

Furthermore, when searching for keywords, you might want to surround them with word boundary anchors: fox will only match the complete word fox but not the fox in foxy.

注意

如果要匹配的文本还可以包含换行符,则需要设置正则表达式引擎的点匹配全部"选项.通常,您可以通过在正则表达式前添加 (?s) 来实现这一点,但这不适用于所有正则表达式引擎(尤其是 JavaScript).

If the text to be matched can also include linebreaks, you will need to set the "dot matches all" option of your regex engine. Usually, you can achieve that by prepending (?s) to the regex, but that doesn't work in all regex engines (notably JavaScript).

替代解决方案:

在许多情况下,您还可以使用使用惰性量词的更简单、更易读的解决方案.通过将 ? 添加到 * 量词,它会尝试从当前位置匹配尽可能少的字符:

In many cases, you can also use a simpler, more readable solution that uses a lazy quantifier. By adding a ? to the * quantifier, it will try to match as few characters as possible from the current position:

.*?(?=(?:X)|$)

将匹配任意数量的字符,在 X(可以是任何正则表达式)或字符串末尾(如果 X 不匹配)之前停止.您可能还需要设置点匹配全部"选项才能使其工作.(注意:我在 X 周围添加了一个非捕获组,以便可靠地将其与交替隔离)

will match any number of characters, stopping right before X (which can be any regex) or the end of the string (if X doesn't match). You may also need to set the "dot matches all" option for this to work. (Note: I added a non-capturing group around X in order to reliably isolate it from the alternation)

这篇关于正则表达式直到但不包括的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆