可以匹配空字符串的正则表达式破坏了javascript regex引擎 [英] Regex that can match empty string is breaking the javascript regex engine

查看:47
本文介绍了可以匹配空字符串的正则表达式破坏了javascript regex引擎的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了以下正则表达式:/\D(?!.*\D)|^-?|\d+/g

I wrote the following regex: /\D(?!.*\D)|^-?|\d+/g

我认为它应该这样工作:

I think it should work this way:

\D(?!.*\D)    # match the last non-digit
|             # or
^-?           # match the start of the string with optional literal '-' character
|             # or
\d+           # match digits

但是,事实并非如此:

var arrTest = '12,345,678.90'.match(/\D(?!.*\D)|^-?|\d+/g);
console.log(arrTest);

var test = arrTest.join('').replace(/[^\d-]/, '.');
console.log(test);

但是,在 Regex101 .如我所描述的那样工作.

However, when playing it with PCRE(php)-flavour online at Regex101. It works as I described.

我不知道我是否认为它应该以一种不起作用的方式工作.或者,如果javascript regex-flavor中不允许使用某些模式.

I don't know if I think it should work one way it doesn't work. Or if there are some pattern not allowed in javascript regex-flavour.

推荐答案

JS的工作原理不同于PCRE.关键是JS正则表达式引擎不能很好地处理零长度匹配,索引只是手动递增,而零长度匹配之后的下一个字符将被跳过. ^-?可以匹配一个空字符串,并且它与12,345,678.90开头匹配,而跳过1.

JS works differently than PCRE. The point is that the JS regex engine does not handle zero-length matches well, the index is just manually incremented and the next character after a zero-length match is skipped. The ^-? can match an empty string, and it matches the 12,345,678.90 start, skipping 1.

如果我们查看 String#match文档,我们将看到在找到 zero-length 匹配项之后,使用全局正则表达式对match的每次调用都会增加正则表达式对象的lastIndex:

If we have a look at the String#match documentation, we will see that each call to match with a global regex increases the regex object's lastIndex after the zero-length match is found:

  1. 否则, global true
    一种.用参数" lastIndex "和0调用rx的[[Put]]内部方法.
    b.假设A是一个新数组,就像通过表达式 new Array()创建的一样,其中 Array 是具有该名称的标准内置构造函数.
    C.假设 previousLastIndex 为0.
    d.设 n 为0.
    e.让 lastMatch true .
    F.重复,而 lastMatch true
        i.假设 result 是使用 rx 作为 this 调用 exec 的[[Call]]内部方法的结果包含 S 的值和参数列表.
        ii.如果 result null ,则将 lastMatch 设置为 false .
        iii.否则,结果不是
             1.假设 thisIndex 是使用参数" lastIndex "调用 rx 的[[Get]]内部方法的结果.
             2.如果 thisIndex = previousLastIndex ,则
                         用参数" lastIndex "和 thisIndex + 1 调用 rx 的[[Put]]内部方法.
                b.将 previousLastIndex 设置为 thisIndex +1.
  1. Else, global is true
    a. Call the [[Put]] internal method of rx with arguments "lastIndex" and 0.
    b. Let A be a new array created as if by the expression new Array() where Array is the standard built-in constructor with that name.
    c. Let previousLastIndex be 0.
    d. Let n be 0.
    e. Let lastMatch be true.
    f. Repeat, while lastMatch is true
        i. Let result be the result of calling the [[Call]] internal method of exec with rx as the this value and argument list containing S.
        ii. If result is null, then set lastMatch to false.
        iii. Else, result is not null
            1. Let thisIndex be the result of calling the [[Get]] internal method of rx with argument "lastIndex".
            2. If thisIndex = previousLastIndex then
                a. Call the [[Put]] internal method of rx with arguments "lastIndex" and thisIndex+1.
                b. Set previousLastIndex to thisIndex+1.

因此,匹配过程从 8a 8f 初始化辅助结构,然后输入一会儿代码块(重复执行直到 lastMatch true ,内部的 exec 命令匹配字符串开头的空白( 8fi -> 8fiii ) ,并且结果不是 null ,因此将 thisIndex 设置为上一次成功匹配的 lastIndex ,并且匹配为零长度(基本上是 thisIndex = previousLastIndex ), previousLastIndex 设置为 thisIndex + 1 -零长度匹配成功后跳过当前位置.

So, the matching process goes from 8a till 8f initializing the auxiliary structures, then a while block is entered (repeated until lastMatch is true, an internal exec command matches the empty space at the start of the string (8fi -> 8fiii), and as the result is not null, thisIndex is set to the lastIndex of the previous successful match, and as the match was zero-length (basically, thisIndex = previousLastIndex), the previousLastIndex is set to thisIndex+1 - which is skipping the current position after a successful zero-length match.

您实际上可以在replace方法中使用一个更简单的正则表达式,并使用回调函数来使用适当的替换项:

You may actually use a simpler regex inside a replace method and use a callback to use appropriate replacements:

var res = '-12,345,678.90'.replace(/(\D)(?!.*\D)|^-|\D/g, function($0,$1) {
   return $1 ? "." : "";
});
console.log(res);

模式详细信息:

  • (\D)(?!.*\D)-一个非数字(捕获到组1中),除了换行符和另一个非数字之外,后面没有0+个字符
  • |-或
  • ^--字符串开头的连字符
  • |-或
  • \D-一个非数字
  • (\D)(?!.*\D) - a non-digit (captured into Group 1) that is not followed with 0+ chars other than a newline and another non-digit
  • | - or
  • ^- - a hyphen at the string start
  • | - or
  • \D - a non-digit

请注意,这里甚至不必使开头的连字符为可选.

Note that here you do not even have to make the hyphen at the start optional.

这篇关于可以匹配空字符串的正则表达式破坏了javascript regex引擎的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆