正则表达式搜索避免嵌套结果 [英] Regular expression search avoid nested results

查看:146
本文介绍了正则表达式搜索避免嵌套结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的文档包含几个代码块实例,如下所示:

My document contains several instance of code blocks looking like:

{% highlight %}
//some code
{% endhighlight %}

在Atom.io中,我试图编写一个正则表达式搜索来捕获这些内容.

In Atom.io, I am trying to write a regex search to capture those.

我的第一次尝试是:
{% highlight .* %}([\S\s]+){% endhighlight %}

My first try was:
{% highlight .* %}([\S\s]+){% endhighlight %}

问题在于,同一文档中有多个代码块,它还会捕获第一个代码块,直到最后一个代码块,而且全部匹配.

The problem is because there are several code blocks in the same document, it also catches the first code block until the last one, all in one match.

我虽然排除了{字符:
{% highlight .* %}([^\{]+){% endhighlight %}

I though to exclude the { character:
{% highlight .* %}([^\{]+){% endhighlight %}

但是问题是某些代码块包含有效的{字符(例如function(){ ... }).

But the problem is that some of the code blocks contain valid { characters (such as function(){ ... }).

推荐答案

Karthik的惰性匹配解决方案的问题是,当您在{% highlight %}{% end highlight %}之间有较大的子字符串时,[\s\S]*?将存储越来越多的文本进入最终可能会溢出的回溯缓冲区.

The problem with Karthik's lazy matching solution is that when you have large substrings between {% highlight %} and {% end highlight %} the [\s\S]*? will be storing more and more text into the backtracking buffer that can eventually overrun.

使用 展开循环 技术,您可以避免这种情况:

Using an unrolling-the-loop technique, you can avoid that:

{% highlight %}([^{]*(?:{(?!% endhighlight %})[^{]*)*){% endhighlight %}

请参见 regex演示

这样,突出显示块内的子字符串可以是任意长度,并且性能将保持很快.

This way, the substrings inside the highlight blocks can be of any length and performance will stay fast.

正则表达式主要部分:

  • {% highlight %}-从字面上匹配{% highlight %}文本
  • ([^{]*(?:{(?!% endhighlight %})[^{]*)*)-将与{% endhighlight %}不匹配的所有内容匹配并将其捕获到组1中:
    • [^{]*-除{
    • 之外的0个或更多字符
    • (?:{(?!% endhighlight %})[^{]*)*-0或更多序列....
      • {(?!% endhighlight %})-文字{后面没有% endhighlight %}
      • [^{]*-除{
      • 之外的0个或更多字符
      • {% highlight %} - matches the {% highlight %} text literally
      • ([^{]*(?:{(?!% endhighlight %})[^{]*)*) - matches and captures into group 1 everything that is not {% endhighlight %} matching:
        • [^{]* - 0 or more characters other than {
        • (?:{(?!% endhighlight %})[^{]*)* - 0 or more sequences of....
          • {(?!% endhighlight %}) - literal { not followed by % endhighlight %}
          • [^{]* - 0 or more characters other than {

          这与{% highlight %}([\s\S]*?){% endhighlight %}基本上相同,但是未包装".线性执行可确保更安全,更快的用户体验.

          This is basically the same as {% highlight %}([\s\S]*?){% endhighlight %}, but "unwraped". The linear execution ensures safer and faster user experience.

          这篇关于正则表达式搜索避免嵌套结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆