在平衡字符内搜索特定文本(递归 [英] Searching for specific text inside balanced chars (recursive

查看:110
本文介绍了在平衡字符内搜索特定文本(递归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下(已消毒的)输入:

Given the following (sanitized)input:

Return_t
func()
{
  Type<SubType> cursorFeature(true);

  while (nDist < 800)
  {
    Result = Example(&var, 0, cursorFeature); //interested in this because inside loop, and not dereferenced or incremented
    if (!(++cursorFeature).NoMoreRecords())
    {
      if (!BLAH(blah)
        && (otherFunc(&var, &cursorFeature->derefenced, MACRO) != 0))
      {
        bIsChanged = true;
        break;
      }
      memcpy(&var, &cursorFeature->dereferenced, sizeof(anotherType_t));
    }
  }

  //more stuff
  }
}

我有以下正则表达式捕获使用Type

I have the following regex that captures a loop occuring after a use of Type

Type.*<.*>\s*(\w*)[^}]*?(?:while|for)\s*\(.*?\n?.*?(\{(?>[^{}]|(?-1))*\})

https://regex101.com/r/Kr0zQq/3

我还具有以下正则表达式,可以捕获类型为Type的变量的特定用法:

I also have the following regular expression that captures specific use of variable of type Type:

Type.*<.*>\s*(\w*)[\s\S]*?\K(?<!\+\+)\1(?!->|\+\+)

https://regex101.com/r/Kr0zQq/4

我的目标是以某种方式将它们组合在一起,最好是组合成 ONE 正则表达式(如果可能的话,我也希望能够从VS内进行此搜索).考虑到递归正则表达式的性质,我不确定这是否有可能...而且我怀疑不是.如果不是这样,那么在搜索数百个文件时不会丢失文件名/数字上下文的聪明方法将是很棒的.我基本上需要文件名和行号.上下文很好,但不是必需的.

My goal is to somehow combine these, preferably into ONE regex (I'd like to be able to do this search from within VS as well, if possible). Given the nature of recursive regexes, I'm not sure this is at all possible...and I suspect it isn't. If not, something clever that doesn't lose filename/number context when searching through hundreds of files would be awesome. I basically need the filename and line number. Context is great, but not required.

为澄清起见,我想捕获cursorFeature,因为它的类型为Type,然后我想在"loop"{.....}

To clarify, I want to capture cursorFeature because it's of type Type, and then I want to search for use of it inside "loop"{.....}

编辑

关于我使用正则表达式解决此问题的说明.所搜索的代码超过一百万行,涵盖了由各种编译器编译并由多个构建系统构建的多个项目.而且,同时使用宏和高级语言功能意味着,甚至VS Intellisense 经常也会误判VS能够编译的代码.和YCM(vim)一样.因此,也许过度贪婪的正则表达式(误报率为70%)就可以了. (因为缺少循环中变量使用的进一步出现,因为此时通常可以轻松地扫描其余部分.)但是,尝试使用通用" PCRE作为单行代码执行此操作可能是愚蠢的. :)

Just a note about my use of regexes to solve this problem. The code being searched is something over a million lines, spanning multiple projects compiled by various compilers and built by multiple build systems. And the use of both macros and advanced language features means, for example, that even VS Intellisense often misparses code VS is able to compile. As does YCM (vim). So, a perhaps overly greedy regex that is 70% false positives is fine. (As is missing further occurences of variable use within a loop since it's generally easy to scan the rest at that point.) However, attempting to do this as a one-liner using a "generic" PCRE was perhaps foolish. :)

推荐答案

您可以使用以下三个选项来查看在以下循环中是否存在匹配的变量名.第一个是将(\1)添加到原子组,并检查环境中此捕获组的存在(如果可能):

You have three options to look if matched variable name exists in following loop or not. First one is adding (\1) to atomic group and check existence of this capturing group within your environment (if it's possible):

(?>(\1)|[^{}]|(?-2))*

第二,您可以通过使用负前瞻来缓和[^{}]的匹配过程:

Second, you could temper the matching process of [^{}] by using a negative lookahead:

(?>(?!\1)[^{}]|(?-1))*

但是如果您不像我在注释中提供的演示中那样将闭合括号设为可选,则失败.

but it fails if you don't make closing brace optional as I did in demo provided in comments.

第三种更好的解决方法是使用动词(*ACCEPT),该动词立即导致成功匹配结束,而无需对正则表达式进行进一步更改:

Third and better work around is using verb (*ACCEPT) which causes the end of a successful match immediately without making further changes in regex:

(?>(\1)(*ACCEPT)|[^{}]|(?-2))*

实时演示

这篇关于在平衡字符内搜索特定文本(递归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆