当没有匹配项时,正则表达式的性能较差 [英] Regex poor performance when nothing matches

查看:132
本文介绍了当没有匹配项时,正则表达式的性能较差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的正则表达式工作缓慢,但仅在模式不匹配的情况下才有问题.在所有其他情况下,即使模式在文本末尾匹配,性能也是可以接受的.我正在测试100KB文本输入的性能.

I have a problem with slow working regex, but only in case when the patter doesn't match. In all other cases performance are acceptable, even if patter matches in the end of text. I'am testing performance on 100KB text input.

我想做的是使用[]而不是<>括号转换类似HTML的语法的输入,并将其转换为有效的XML.

What I am trying to do is to convert input in HTML-like syntax which is using [] instead of <> brackets and translate it to valid XML.

样本输入:

...some content[vc_row param="test1"][vc_column]text [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content

示例输出:

...some content<div class="vc_row" param="test1"><div class="vc_column" >text [brackets in text] content</div></div><div class="vc_row" param="xxx">text content</div>...some more content

为此,我正在使用正则表达式:

To do this I am using regex:

/(.*)(\[\/?vc_column|\[\/?vc_row)( ?)(.*?)(\])(.*)/

我在while循环中执行此操作,直到模式匹配为止.

And I do this in while loop until the patter matches.

正如我之前提到的那样,此方法有效,但最后一次迭代速度极慢(如果没有匹配项,则首先迭代).这是我正在使用的完整javascript:

As I mentioned before this works, but last iteration is extremly slow (or first if nothing matches). Here is complete javascript I am using:

var str   = '...some content[vc_row param="test1"][vc_column]text content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';

var regex = /(.*)(\[\/?vc_column|\[\/?vc_row)( ?)(.*?)(\])(.*)/;
while (matches = str.match(regex)) {
    matches = str.match(regex);
    if (matches[2].slice(1, 2) !== '/')
        str = matches[1] + "<div class=\"" + matches[2].slice(1) + "\"" + " " + matches[4] + ">" + matches[6];
    else
        str = matches[1] + "</div>" + matches[6];
}

我如何改善我的正则表达式不匹配"性能?

How could i improve my regex "not match" performance?

推荐答案

您可以将其拆分为2个正则表达式. 一个用于开始标签,一个用于结束标签.

You can split it up in 2 regex. One for the start tags, one for the closing tags.

然后替换全局2的链g.

var str   = '...some content[vc_row param="test1"][vc_column]text with [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';

const reg1 = /\[(vc_(?:column|row))(\s+[^\]]+)?\s*\]/g;
const reg2 = /\[\/(vc_(?:column|row))\s*\]/g;

var result = str.replace(reg1, "<div class=\"$1\"$2>").replace(reg2, "</div>");

console.log(result);

请注意,原始正则表达式中的那些(.*)并不需要这种方式.

Note that those (.*) in the original regex aren't needed this way.

使用无名函数,则可以通过1个正则表达式替换来完成.

Using a nameless function, then it could be done via 1 regex replace.

var str   = '...some content[vc_row param="test1"][vc_column]text with [brackets in text] content[/vc_column][/vc_row][vc_row param="xxx"]text content[/vc_row]...some more content';

const reg = /\[(\/)?(vc_(?:column|row))(\s+[^\]]+)?\s*\]/g;

var result = str.replace(reg, function(m,c1,c2,c3){
              if(c1) return "</div>";
              else return "<div class=\""+ c2 +"\""+ (c3?c3:"") +">";
             });

console.log(result);

这篇关于当没有匹配项时,正则表达式的性能较差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆