正则表达式负面lookbehind在JavaScript中无效 [英] Regex negative lookbehind not valid in JavaScript

查看:245
本文介绍了正则表达式负面lookbehind在JavaScript中无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑:

  var re = /(?< = foo)bar / gi; 

这是Plunker中无效的正则表达式。为什么?

解决方案

JavaScript缺乏对 lookbehinds的支持喜欢(?< = ...)(正面)和(?<!...)(负面),但这并不意味着你仍然无法在JavaScript中实现这种逻辑。



匹配(非全局)



正向后观匹配:



  //来自/(?< = foo)bar / i 
var matcher = mystring.match(/ foo(bar)/ i);
if(matcher){
//用匹配器[1]做的东西是匹配bar的部分
}



固定宽度负向后观匹配:



  //来自/(?<!foo)bar / i 
var matcher = mystring.match(/(?!foo)(?:^。{0,2} |。{3})( bar)/ i);
if(matcher){
//用matcher [1](bar)做事,知道它不遵循foo
}

负面观察可以在没有全局标志的情况下完成,但只有固定宽度,必须计算宽度(使用替换可能会变得困难) 。使用(?!foo)。{3}(bar)会更简单,大致相同,但它不会匹配以rebar开头的行,因为无法匹配换行符,因此我们需要上述代码的替换来匹配字符4之前带有bar的行。



如果您需要宽度可变,请使用以下全局解决方案并在结尾处放置 break if 节。 (这种限制很常见。 .NET vim JGsoft only 正则表达式引擎,支持可变宽度的后视。 PCRE PHP Perl 仅限于固定宽度。 Python 需要备用正则表达式模块来支持这一点。也就是说,下面的解决方法的逻辑应适用于支持正则表达式的所有语言。)



匹配(全局)



如果需要循环给定字符串中的每个匹配项( g 修饰符,全局匹配),则必须重新定义匹配器<每个循环迭代中的/ code>变量,您必须使用 RegExp.exec() (创建了RegExp 在循环之前)因为 String.match() 解释全局修饰符< a href =https://stackoverflow.com/questions/9214754/what-is-the-difference-between-rege xp -s-exec-function-and-string -s-match-funtitle =RegExp的exec()函数和String的match()函数之间的区别是什么?>不同并将创建一个无限的循环!



全球正面观察:



  var re = / foo (巴)/ GI; //来自/(?< = foo)bar / gi 
while(matcher = re.exec(mystring)){
//用matcher [1]做匹配的部分 bar
}

Stuff当然可以包括填充数组以供进一步使用。



Global Negative lookbehind:



  var re = /(foo) ?酒吧/ GI; //来自/(?<!foo)bar / gi 
while(matcher = re.exec(mystring)){
if(!matcher [1]){
// do matcher [0](bar)的东西,知道它不遵循foo
}
}

请注意,案例这不会完全代表负面的背后。考虑 /(?<!ba)ll / g 匹配 Fall ball bill balll llama 。它只能找到所需的四个匹配中的三个,因为当它解析 balll 时,它会找到 ball 然后继续一个字符迟到 l llama 。这只发生在最后的部分匹配可能会干扰另一端的部分匹配时( balll 中断(ba)?ll foobarbar 可以使用(foo)?bar )唯一的解决方法是使用以上固定宽度方法。



更换



这篇文章名为模仿JavaScript中的Lookbehind ,描述了如何执行此操作。

它的后续跟踪指向短函数集合在JS中实现它。



String.replace()中实现lookbehind更容易,因为你可以创建一个匿名函数作为替代品并处理该函数中的lookbehind逻辑。



这些工作在第一场比赛,但只需添加 g 修饰符就可以全局化。



正面看后替换:



  //假设你想要mystring.replace(/(?< = foo)bar / i,baz):
mystring = mystring.replace(/(foo)?bar / i,
function($ 0,$ 1){return($ 1?$ 1 +baz) :$ 0)}
);

这将获取目标字符串并替换 bar的实例 baz 只要它们跟随 foo 。如果他们这样做, $ 1 匹配并且三元运算符( ?: )返回匹配的文本和替换文本(但不是 bar 部分)。否则,三元运算符返回原始文本。



负面背后替换:



  //假设你想要mystring.replace(/(?<!foo)bar / i,baz):
mystring = mystring.replace(/(foo)?bar / i,
函数($ 0,$ 1){return($ 1?$ 0:baz)}
);

这基本上是相同的,但由于它是一个负面的背后,它会在 $ 1 缺失(我们不需要说 $ 1 +baz,因为我们知道 $ 1 为空)。



这与其他动态宽度负向后观解决方法具有相同的警告,并且通过使用固定宽度方法进行类似修复。 / p>

Consider:

var re = /(?<=foo)bar/gi;

It is an invalid regular expression in Plunker. Why?

解决方案

JavaScript lacks support for lookbehinds like (?<=…) (positive) and (?<!…) (negative), but that doesn't mean you can't still implement this sort of logic in JavaScript.

Matching (not global)

Positive lookbehind match:

// from /(?<=foo)bar/i
var matcher = mystring.match( /foo(bar)/i );
if (matcher) {
  // do stuff with matcher[1] which is the part that matches "bar"
}

Fixed width negative lookbehind match:

// from /(?<!foo)bar/i
var matcher = mystring.match( /(?!foo)(?:^.{0,2}|.{3})(bar)/i );
if (matcher) {
  // do stuff with matcher[1] ("bar"), knowing that it does not follow "foo"
}

Negative lookbehinds can be done without the global flag, but only with a fixed width, and you have to calculate that width (which can get difficult with alternations). Using (?!foo).{3}(bar) would be simpler and roughly equivalent, but it won't match a line starting with "rebar" since . can't match newlines, so we need the above code's alternation to match lines featuring "bar" before character four.

If you need it with a variable width, use the below global solution and put a break at the end of the if stanza. (This limitation is quite common. .NET, vim, and JGsoft are the only regex engines that support variable width lookbehind. PCRE, PHP, and Perl are limited to fixed width. Python requires an alternate regex module to support this. That said, the logic to the workaround below should work for all languages that support regex.)

Matching (global)

When you need to loop on each match in a given string (the g modifier, global matching), you have to redefine the matcher variable in each loop iteration and you must use RegExp.exec() (with the RegExp created before the loop) because String.match() interprets the global modifier differently and will create an infinite loop!

Global positive lookbehind:

var re = /foo(bar)/gi;  // from /(?<=foo)bar/gi
while ( matcher = re.exec(mystring) ) {
  // do stuff with matcher[1] which is the part that matches "bar"
}

"Stuff" may of course include populating an array for further use.

Global Negative lookbehind:

var re = /(foo)?bar/gi;  // from /(?<!foo)bar/gi
while ( matcher = re.exec(mystring) ) {
  if (!matcher[1]) {
    // do stuff with matcher[0] ("bar"), knowing that it does not follow "foo"
  }
}

Note that there are cases in which this will not fully represent the negative lookbehind. Consider /(?<!ba)ll/g matching against Fall ball bill balll llama. It will find only three of the desired four matches because when it parses balll, it finds ball and then continues one character late at l llama. This only occurs when a partial match at the end could interfere with a partial match at a different end (balll breaks (ba)?ll but foobarbar is fine with (foo)?bar) The only solution to this is to use the above fixed width method.

Replacing

There's a wonderful article called Mimicking Lookbehind in JavaScript that describes how to do this.
It even has a follow-up that points to a collection of short functions that implement this in JS.

Implementing lookbehind in String.replace() is much easier since you can create an anonymous function as the replacement and handle the lookbehind logic in that function.

These work on the first match but can be made global by merely adding the g modifier.

Positive lookbehind replacement:

// assuming you wanted mystring.replace(/(?<=foo)bar/i, "baz"):
mystring = mystring.replace( /(foo)?bar/i,
  function ($0, $1) { return ($1 ? $1 + "baz" : $0) }
);

This takes the target string and replaces instances of bar with baz so long as they follow foo. If they do, $1 is matched and the ternary operator (?:) returns the matched text and the replacement text (but not the bar part). Otherwise, the ternary operator returns the original text.

Negative lookbehind replacement:

// assuming you wanted mystring.replace(/(?<!foo)bar/i, "baz"):
mystring = mystring.replace( /(foo)?bar/i,
  function ($0, $1) { return ($1 ? $0 : "baz") }
);

This is essentially the same, but since it's a negative lookbehind, it acts when $1 is missing (we don't need to say $1 + "baz" here because we know $1 is empty).

This has the same caveat as the other dynamic-width negative lookbehind workaround and is similarly fixed by using the fixed width method.

这篇关于正则表达式负面lookbehind在JavaScript中无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆