正则表达式可以返回找到匹配的行数吗? [英] Can a Regex Return the Number of the Line where the Match is Found?

查看:471
本文介绍了正则表达式可以返回找到匹配的行数吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个文本编辑器中,我想用找到这个单词的行号的数字替换给定的单词。这是可能的正则表达式?

解决方案

递归,自引用组(Qtax技巧),反向Qtax或平衡组< h2>

简介

在输入的底部添加一个整数列表类似于着名的数据库黑客(与正则表达式无关),其中一个连接到一个整数表。我原来的答案使用了@Qtax技巧。目前的答案使用递归,Qtax技巧(直接或颠倒的变化),或平衡组。



是的,这是可能的...有一些注意事项和正则表达式欺骗。


  1. 在这个答案中的解决方案是为了演示一些正则表达式语法,而不是实际的答案来实现。

  2. 在文件的最后,我们将粘贴一个数字列表,前面带有一个唯一的分隔符。对于这个实验,附加的字符串是:1:2:3:4:5:6:7 这是一个类似于使用表
  3. 对于前两个解决方案,我们需要一个编辑器,它使用允许递归(解决方案1)或自引用捕获组(解决方案2和3)的正则表达式。记得有两个:Notepad ++和EditPad Pro。对于第三种解决方案,我们需要一个支持平衡组的编辑器。输入文件:


  4. 假设我们正在寻找 pig ,并且希望用行号替换它。



    我们将以此作为输入:

     我的猫

    我的猪
    我的牛
    我的鼠标

    :1:2:3:4:5:6:7


    $ b

    第一个解决方案:递归



    支持的语言:除上面提到的文本编辑器(Notepad ++和EditPad Pro),这个解决方案应该用使用PCRE(PHP,R,Delphi)的语言,在Perl中,在Python中使用Matthew Barnett的 regex 模块(未测试)。

    递归结构处于前瞻状态,是可选的。它的工作是平衡左侧不包含 pig 的行和数字,右边:将其视为平衡一个嵌套结构,如 {{{}}} ...除了在左边,我们有不匹配的行,在右边,我们有数字。重点是,当我们退出预测时,我们知道有多少行被跳过。



    搜索:

     (?sm)(?=。*?pig)(?=((?:^(?:(?!猪) \
    ])*(?: \r \\\
    ))(:( 1)| [^:????。] +)(:\d +)))* \Kpig(=?。 *?(?(2)\2):( \d +))

    Free-Spacing Version with Comments:

     (?xsm)#free-spacing mode,multi-line 
    (?=。*?pig)#如果猪不存在,就立即失败

    (?=#递归结构存在于这个前瞻
    (#Group 1
    (?:#跳过一行
    ^
    (?:( ?!猪)[^ \r\\\
    ])*#零个或多个字符不能跟着猪
    ?:\ r?\\\
    )#换行符

    (?:(?1)| [^:] +)#递归组1或者匹配所有不是: b $ b(:\d +)#匹配数字
    )? #结束组
    )#结束前瞻。
    。*?\Kpig#得到猪
    (?=。*?(?(2)\2):( \d +))#Lookahead:捕获下一个数字

    替换: \ 3



    演示中,看到底部的替换。你可以使用前两行的字母(删除一个空格来制作 pig )来移动第一个出现的 pig regex 模块(未经测试)。通过将 \K 转换为一个原子组和一个原子组的占有量词(见下面的.NET版本),该解决方案很容易适应.NET。 )



    搜索:

     ( ?SM)(??= *猪)(?:(?:^(:(?猪)[^ \r\\\
    ])*(?: \r \\\
    ?))( ?= [^:] +(((1)\1):\d +)))* + * \Kpig。?(= [^:] +((1→)\1): (\d +))

    .NET版本:回到未来

    .NET没有 \K 。它的地方,我们使用一个回到未来的后视(一个后视,其中包含一个前瞻,跳过比赛)。此外,我们需要使用一个原子组而不是占有量词。 (?sm)(?<=(?=。*?pig)(?=???????????????????????????????? :^ [^ \r\\\
    ])*(?: \r \\\
    ))(= [^:](:(?猪?)???+(((1)\1 ):\d +)))*)。*)pig(?= [^:] +(?(1)\1):( \d +))
    / pre>

    包含注释的免费版本(Perl / PCRE版本):

     (?xsm)#自由间隔模式,多行
    (?=。*?pig)#前瞻:如果猪不存在,立即失败保存
    (?:#start counter-line-skipper(不包括猪的线)
    (?:#跳过一行
    ^#
    (?:(猪!)#*零或多个字符没有跟着猪
    (?:\ r?\\\
    )#换行字符

    #为每一行跳过,让组1匹配数字字符串在底部
    (?=#lookahead
    [^:] +#跳过所有不是冒号的字符
    ( #开始组1
    (?(1)\1)#匹配组1如果设置
    :\d +#匹配冒号和一些数字
    )#结束组1
    )#结束前瞻
    )* +#结束反向线 - 船长:零次或多次
    。*? #match
    \ K#放下我们已经匹配的所有东西
    pig#match pig(这是匹配!)
    (?= [^:] +(?(1) \ 1):( \d +))#捕获下一个数字到组2

    替换:

      \ 2 

    输出:

     我的猫
    dog
    my 3
    my cow
    my mouse
    $ b $:1:2:3:4:5:6:7

    演示,请参阅底部的替换。你可以使用前两行的字母(删除一个空格来制作 pig )来移动第一个出现的 pig 到另一行,看看它是如何影响结果的。



    数字分隔符的选择



    在我们的例子中,数字串的分隔符是相当常见的,并且可能发生在其他地方。我们可以创建一个 UNIQUE_DELIMITER 并稍微调整表达式。但是下面的优化效率更高,让我们保持






    第二种解决方案的优化:数字反转字符串

    不是按顺序粘贴数字,而是按照相反的顺序使用它们: code>:7:6:5:4:3:2:1



    在我们的lookaheads中,通过一个简单的。* 向下输入到底部,然后从那里开始回溯。由于我们知道我们在字符串的末尾,所以我们不必担心:digits 作为字符串另一部分的一部分。

    输入:

     我的猫pi g 
    dog p ig
    我的猪
    我的牛
    我的鼠标
    $ b $:7:6:5:4: 3:2:1

    搜索

     (?xsm)#free-spacing mode,multi-line 
    (?=。*?pig)#lookahead:如果猪不在那里,立即失败以节省工作
    (?:#start counter-line-skipper(不包括猪的行)
    (?:#跳过一行没有猪
    ^#
    (?:(?!pig)[^ \r\\\
    ])*#零个或多个字符没有后跟猪
    (?:\ r?\\\
    )#换行符

    #组1匹配底部数字字符串的增加部分
    (?=#lookahead
    。*#到输入的结尾
    (#开始组1
    :\d +#匹配colo n和一些数字
    (?(1)\1)#匹配组1如果设置
    )#结束组1
    )#结束前瞻
    )* +结束计数器-line-skipper:零次或多次
    。*? #匹配
    \ K#放弃比赛到目前为止
    猪#匹配猪(这是匹配!)
    (?=。*(\ d +)(?(1)\ 1))#将下一个数字捕获到第2组

    替换: \ 2



    查看

    这个解决方案是针对.NET的。



    搜索:

     (M +)(小于?= \A(小于?c取代; ^(:(?猪)[^ \r\\\
    ])*(?: \\ (\\


    自由间距版本评论:



    pre $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ (?< c>#跳过一行d没有猪
    #c组捕获的长度将作为计数器
    ^#行首
    (?:(?!猪)[^ \r\\\
    ])*#零个或多个字符没有跟着猪
    (?:\ r?\\\
    )#换行符
    )#结尾跳过
    *#重复跳过
    。*? #我们在猪线上:lazily match chars before pig
    )#end lookbehind
    pig#match pig:这是匹配
    (?=#lookahead
    [^ :] +#得到数字
    (?(c)#如果c组已被设置
    (?< -c>:\d +)#递减c,而我们匹配一组数字
    *#重复:只要c组的长度被捕获,则只会重复> 0
    )#如果组c已被设置,则结束
    :(\ d +)#匹配下一个数字组,捕获数字
    )#end lokahead

    替换: $ 1






    参考




    In a text editor, I want to replace a given word with the number of the line number on which this word is found. Is this is possible with Regex?

    解决方案

    Recursion, Self-Referencing Group (Qtax trick), Reverse Qtax or Balancing Groups

    Introduction

    The idea of adding a list of integers to the bottom of the input is similar to a famous database hack (nothing to do with regex) where one joins to a table of integers. My original answer used the @Qtax trick. The current answers use either Recursion, the Qtax trick (straight or in a reversed variation), or Balancing Groups.

    Yes, it is possible... With some caveats and regex trickery.

    1. The solutions in this answer are meant as a vehicle to demonstrate some regex syntax more than practical answers to be implemented.
    2. At the end of your file, we will paste a list of numbers preceded with a unique delimiter. For this experiment, the appended string is :1:2:3:4:5:6:7 This is a similar technique to a famous database hack that uses a table of integers.
    3. For the first two solutions, we need an editor that uses a regex flavor that allows recursion (solution 1) or self-referencing capture groups (solutions 2 and 3). Two come to mind: Notepad++ and EditPad Pro. For the third solution, we need an editor that supports balancing groups. That probably limits us to EditPad Pro or Visual Studio 2013+.

    Input file:

    Let's say we are searching for pig and want to replace it with the line number.

    We'll use this as input:

    my cat
    dog
    my pig
    my cow
    my mouse
    
    :1:2:3:4:5:6:7
    


    First Solution: Recursion

    Supported languages: Apart from the text editors mentioned above (Notepad++ and EditPad Pro), this solution should work in languages that use PCRE (PHP, R, Delphi), in Perl, and in Python using Matthew Barnett's regex module (untested).

    The recursive structure lives in a lookahead, and is optional. Its job is to balance lines that don't contain pig, on the left, with numbers, on the right: think of it as balancing a nested construct like {{{ }}}... Except that on the left we have the no-match lines, and on the right we have the numbers. The point is that when we exit the lookahead, we know how many lines were skipped.

    Search:

    (?sm)(?=.*?pig)(?=((?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?:(?1)|[^:]+)(:\d+))?).*?\Kpig(?=.*?(?(2)\2):(\d+))
    

    Free-Spacing Version with Comments:

    (?xsm)             # free-spacing mode, multi-line
    (?=.*?pig)        # fail right away if pig isn't there
    
    (?=               # The Recursive Structure Lives In This Lookahead
    (                 # Group 1
       (?:               # skip one line 
          ^              
          (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
          (?:\r?\n)      # newline chars
        ) 
        (?:(?1)|[^:]+)   # recurse Group 1 OR match all chars that are not a :
        (:\d+)           # match digits
    )?                 # End Group 
    )                 # End lookahead. 
    .*?\Kpig                # get to pig
    (?=.*?(?(2)\2):(\d+))   # Lookahead: capture the next digits
    

    Replace: \3

    In the demo, see the substitutions at the bottom. You can play with the letters on the first two lines (delete a space to make pig) to move the first occurrence of pig to a different line, and see how that affects the results.


    Second Solution: Group that Refers to Itself ("Qtax Trick")

    Supported languages: Apart from the text editors mentioned above (Notepad++ and EditPad Pro), this solution should work in languages that use PCRE (PHP, R, Delphi), in Perl, and in Python using Matthew Barnett's regex module (untested). The solution is easy to adapt to .NET by converting the \K to a lookahead and the possessive quantifier to an atomic group (see the .NET Version a few lines below.)

    Search:

    (?sm)(?=.*?pig)(?:(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*+.*?\Kpig(?=[^:]+(?(1)\1):(\d+))
    

    .NET version: Back to the Future

    .NET does not have \K. It its place, we use a "back to the future" lookbehind (a lookbehind that contains a lookahead that skips ahead of the match). Also, we need to use an atomic group instead of a possessive quantifier.

    (?sm)(?<=(?=.*?pig)(?=(?>(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*).*)pig(?=[^:]+(?(1)\1):(\d+))
    

    Free-Spacing Version with Comments (Perl / PCRE Version):

    (?xsm)             # free-spacing mode, multi-line
    (?=.*?pig)        # lookahead: if pig is not there, fail right away to save the effort
    (?:               # start counter-line-skipper (lines that don't include pig)
       (?:               # skip one line 
          ^              # 
          (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
          (?:\r?\n)      # newline chars
        )   
       # for each line skipped, let Group 1 match an ever increasing portion of the numbers string at the bottom
       (?=             # lookahead
          [^:]+           # skip all chars that are not colons
          (               # start Group 1
            (?(1)\1)      # match Group 1 if set
            :\d+          # match a colon and some digits
          )               # end Group 1
       )               # end lookahead
    )*+               # end counter-line-skipper: zero or more times
    .*?               # match
    \K                # drop everything we've matched so far
    pig               # match pig (this is the match!)
    (?=[^:]+(?(1)\1):(\d+))   # capture the next number to Group 2
    

    Replace:

    \2
    

    Output:

    my cat
    dog
    my 3
    my cow
    my mouse
    
    :1:2:3:4:5:6:7
    

    In the demo, see the substitutions at the bottom. You can play with the letters on the first two lines (delete a space to make pig) to move the first occurrence of pig to a different line, and see how that affects the results.

    Choice of Delimiter for Digits

    In our example, the delimiter : for the string of digits is rather common, and could happen elsewhere. We can invent a UNIQUE_DELIMITER and tweak the expression slightly. But the following optimization is even more efficient and lets us keep the :


    Optimization on Second Solution: Reverse String of Digits

    Instead of pasting our digits in order, it may be to our benefit to use them in the reverse order: :7:6:5:4:3:2:1

    In our lookaheads, this allows us to get down to the bottom of the input with a simple .*, and to start backtracking from there. Since we know we're at the end of the string, we don't have to worry about the :digits being part of another section of the string. Here's how to do it.

    Input:

    my cat pi g
    dog p ig
    my pig
    my cow
    my mouse
    
    :7:6:5:4:3:2:1
    

    Search:

    (?xsm)             # free-spacing mode, multi-line
    (?=.*?pig)        # lookahead: if pig is not there, fail right away to save the effort
    (?:               # start counter-line-skipper (lines that don't include pig)
       (?:               # skip one line that doesn't have pig
          ^              # 
          (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
          (?:\r?\n)      # newline chars
        )   
       # Group 1 matches increasing portion of the numbers string at the bottom
       (?=             # lookahead
          .*           # get to the end of the input
          (               # start Group 1
            :\d+          # match a colon and some digits
            (?(1)\1)      # match Group 1 if set
          )               # end Group 1
       )               # end lookahead
    )*+               # end counter-line-skipper: zero or more times
    .*?               # match
    \K                # drop match so far
    pig               # match pig (this is the match!)
    (?=.*(\d+)(?(1)\1))   # capture the next number to Group 2
    

    Replace: \2

    See the substitutions in the demo.

    Third Solution: Balancing Groups

    This solution is specific to .NET.

    Search:

    (?m)(?<=\A(?<c>^(?:(?!pig)[^\r\n])*(?:\r?\n))*.*?)pig(?=[^:]+(?(c)(?<-c>:\d+)*):(\d+))
    

    Free-Spacing Version with Comments:

    (?xm)                # free-spacing, multi-line
    (?<=                 # lookbehind
       \A                # 
       (?<c>               # skip one line that doesn't have pig
                           # The length of Group c Captures will serve as a counter
         ^                    # beginning of line
         (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
         (?:\r?\n)            # newline chars
       )                   # end skipper
       *                   # repeat skipper
       .*?                 # we're on the pig line: lazily match chars before pig
       )                # end lookbehind
    pig                 # match pig: this is the match
    (?=                 # lookahead
       [^:]+               # get to the digits
       (?(c)               # if Group c has been set
         (?<-c>:\d+)         # decrement c while we match a group of digits
         *                   # repeat: this will only repeat as long as the length of Group c captures > 0 
       )                   # end if Group c has been set
       :(\d+)              # Match the next digit group, capture the digits
    )                    # end lokahead
    

    Replace: $1


    Reference

    这篇关于正则表达式可以返回找到匹配的行数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆