正则表达式搜索避免嵌套结果 [英] Regular expression search avoid nested results

查看：146 发布时间：2020/9/13 19:10:13 regex atom-editor

本文介绍了正则表达式搜索避免嵌套结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的文档包含几个代码块实例，如下所示:

My document contains several instance of code blocks looking like:

{% highlight %}
//some code
{% endhighlight %}

在Atom.io中，我试图编写一个正则表达式搜索来捕获这些内容.

In Atom.io, I am trying to write a regex search to capture those.

我的第一次尝试是:
{% highlight .* %}([\S\s]+){% endhighlight %}

My first try was:
{% highlight .* %}([\S\s]+){% endhighlight %}

问题在于，同一文档中有多个代码块，它还会捕获第一个代码块，直到最后一个代码块，而且全部匹配.

The problem is because there are several code blocks in the same document, it also catches the first code block until the last one, all in one match.

我虽然排除了{字符:
{% highlight .* %}([^\{]+){% endhighlight %}

I though to exclude the { character:
{% highlight .* %}([^\{]+){% endhighlight %}

但是问题是某些代码块包含有效的{字符(例如function(){ ... }).

But the problem is that some of the code blocks contain valid { characters (such as function(){ ... }).

推荐答案

Karthik的惰性匹配解决方案的问题是，当您在{% highlight %}和{% end highlight %}之间有较大的子字符串时，[\s\S]*?将存储越来越多的文本进入最终可能会溢出的回溯缓冲区.

The problem with Karthik's lazy matching solution is that when you have large substrings between {% highlight %} and {% end highlight %} the [\s\S]*? will be storing more and more text into the backtracking buffer that can eventually overrun.

使用 展开循环 技术，您可以避免这种情况:

Using an unrolling-the-loop technique, you can avoid that:

{% highlight %}([^{]*(?:{(?!% endhighlight %})[^{]*)*){% endhighlight %}

请参见 regex演示

这样，突出显示块内的子字符串可以是任意长度，并且性能将保持很快.

This way, the substrings inside the highlight blocks can be of any length and performance will stay fast.

正则表达式主要部分:

{% highlight %}-从字面上匹配{% highlight %}文本
([^{]*(?:{(?!% endhighlight %})[^{]*)*)-将与{% endhighlight %}不匹配的所有内容匹配并将其捕获到组1中:
- [^{]*-除{
- (?:{(?!% endhighlight %})[^{]*)*-0或更多序列....
  - {(?!% endhighlight %})-文字{后面没有% endhighlight %}
  - [^{]*-除{
  - {% highlight %} - matches the {% highlight %} text literally
  - ([^{]*(?:{(?!% endhighlight %})[^{]*)*) - matches and captures into group 1 everything that is not {% endhighlight %} matching:
    - [^{]* - 0 or more characters other than {
    - (?:{(?!% endhighlight %})[^{]*)* - 0 or more sequences of....
      - {(?!% endhighlight %}) - literal { not followed by % endhighlight %}
      - [^{]* - 0 or more characters other than {
      这与{% highlight %}([\s\S]*?){% endhighlight %}基本上相同，但是未包装".线性执行可确保更安全，更快的用户体验.
      
      This is basically the same as {% highlight %}([\s\S]*?){% endhighlight %}, but "unwraped". The linear execution ensures safer and faster user experience.
      
      这篇关于正则表达式搜索避免嵌套结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式搜索避免嵌套结果 [英] Regular expression search avoid nested results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

正则表达式搜索避免嵌套结果 [英] Regular expression search avoid nested results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭