正则表达式,直到但不包括 [英] Regex Until But Not Including

查看:303
本文介绍了正则表达式,直到但不包括的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于正则表达式,直到但不包括搜索的语法是什么?有点像:

For regex what is the syntax for search until but not including? Kinda like:

Haystack:
The quick red fox jumped over the lazy brown dog

Expression:
.*?quick -> and then everything until it hits the letter "z" but do not include z

推荐答案

说搜索直到X但不包括X"的明确方式是:

The explicit way of saying "search until X but not including X" is:

(?:(?!X).)*

其中,X可以是任何正则表达式.

where X can be any regular expression.

不过,对于您而言,这可能是过大了-这里最简单的方法是

In your case, though, this might be overkill - here the easiest way would be

[^z]*

这将匹配除z以外的所有内容,因此将在下一个z之前停止.

This will match anything except z and therefore stop right before the next z.

因此.*?quick[^z]*将匹配The quick fox jumps over the la.

但是,一旦您要查找多个简单字母,例如(?:(?!X).)*就开始起作用

However, as soon as you have more than one simple letter to look out for, (?:(?!X).)* comes into play, for example

(?:(?!lazy).)*-匹配所有内容,直到单词lazy的开头.

(?:(?!lazy).)* - match anything until the start of the word lazy.

这是使用 超前断言 负面的前瞻.

This is using a lookahead assertion, more specifically a negative lookahead.

.*?quick(?:(?!lazy).)*将匹配The quick fox jumps over the.

说明:

(?:        # Match the following but do not capture it:
 (?!lazy)  # (first assert that it's not possible to match "lazy" here
 .         # then match any character
)*         # end of group, zero or more repetitions.

此外,在搜索关键字时,您可能希望用单词边界锚点将它们包围:\bfox\b仅匹配完整的单词fox,而不匹配foxy中的狐狸.

Furthermore, when searching for keywords, you might want to surround them with word boundary anchors: \bfox\b will only match the complete word fox but not the fox in foxy.

注意

如果要匹配的文本还可以包含换行符,则需要设置正则表达式引擎的全部匹配点"选项.通常,您可以通过在正则表达式前加上(?s)来实现这一点,但这不适用于所有正则表达式引擎(特别是JavaScript).

If the text to be matched can also include linebreaks, you will need to set the "dot matches all" option of your regex engine. Usually, you can achieve that by prepending (?s) to the regex, but that doesn't work in all regex engines (notably JavaScript).

替代解决方案:

在许多情况下,您还可以使用使用惰性量词的更简单,更易读的解决方案.通过向*量词添加?,它将尝试从当前位置匹配尽可能少的字符:

In many cases, you can also use a simpler, more readable solution that uses a lazy quantifier. By adding a ? to the * quantifier, it will try to match as few characters as possible from the current position:

.*?(?=(?:X)|$)

将匹配任意数量的字符,在X(可以是任何正则表达式)或字符串末尾(如果X不匹配)之前停止.您可能还需要设置全部点匹配"选项才能起作用. (注意:我在X周围添加了一个非捕获组,以便可靠地将其与替换隔离开来)

will match any number of characters, stopping right before X (which can be any regex) or the end of the string (if X doesn't match). You may also need to set the "dot matches all" option for this to work. (Note: I added a non-capturing group around X in order to reliably isolate it from the alternation)

这篇关于正则表达式,直到但不包括的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆