正则表达式可以返回找到匹配项的行号吗? [英] Can a Regex Return the Number of the Line where the Match is Found?

查看:33
本文介绍了正则表达式可以返回找到匹配项的行号吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在文本编辑器中,我想用找到该单词的行号的编号替换给定的单词.使用正则表达式可以吗?

In a text editor, I want to replace a given word with the number of the line number on which this word is found. Is this is possible with Regex?

推荐答案

递归、自引用组(Qtax 技巧)、反向 Qtax 或平衡组

简介

在输入底部添加整数列表的想法类似于一个著名的数据库黑客(与正则表达式无关),其中一个连接到一个整数表.我的原始答案使用了@Qtax 技巧.当前的答案使用递归、Qtax 技巧(直接或反向变化)或平衡组.

The idea of adding a list of integers to the bottom of the input is similar to a famous database hack (nothing to do with regex) where one joins to a table of integers. My original answer used the @Qtax trick. The current answers use either Recursion, the Qtax trick (straight or in a reversed variation), or Balancing Groups.

是的,这是可能的...有一些警告和正则表达式技巧.

  1. 此答案中的解决方案旨在作为展示一些正则表达式语法的工具,而不是要实施的实际答案.
  2. 在您的文件末尾,我们将粘贴以唯一分隔符开头的数字列表.对于这个实验,附加的字符串是 :1:2:3:4:5:6:7 这是一种类似于使用整数表的著名数据库黑客的技术.
  3. 对于前两个解决方案,我们需要一个使用正则表达式风格的编辑器,允许递归(解决方案 1)或自引用捕获组(解决方案 2 和 3).我想到了两个:Notepad++ 和 EditPad Pro.对于第三个解决方案,我们需要一个支持平衡组的编辑器.这可能会限制我们使用 EditPad Pro 或 Visual Studio 2013+.
  1. The solutions in this answer are meant as a vehicle to demonstrate some regex syntax more than practical answers to be implemented.
  2. At the end of your file, we will paste a list of numbers preceded with a unique delimiter. For this experiment, the appended string is :1:2:3:4:5:6:7 This is a similar technique to a famous database hack that uses a table of integers.
  3. For the first two solutions, we need an editor that uses a regex flavor that allows recursion (solution 1) or self-referencing capture groups (solutions 2 and 3). Two come to mind: Notepad++ and EditPad Pro. For the third solution, we need an editor that supports balancing groups. That probably limits us to EditPad Pro or Visual Studio 2013+.

输入文件:

假设我们正在搜索 pig 并希望将其替换为行号.

Let's say we are searching for pig and want to replace it with the line number.

我们将使用它作为输入:

We'll use this as input:

my cat
dog
my pig
my cow
my mouse

:1:2:3:4:5:6:7

<小时>

第一个解决方案:递归

支持的语言:除了上面提到的文本编辑器(Notepad++ 和 EditPad Pro),这个解决方案应该适用于使用 PCRE(PHP、R、Delphi)、Perl 和 Python 的语言,使用 Matthew Barnett 的 regex 模块(未经测试).

递归结构存在于前瞻中,并且是可选的.它的工作是平衡左边不包含 pig 的行,右边有数字:把它想象成平衡嵌套结构,比如 {{{ }}}... 除了左边是不匹配的行,右边是数字.关键是当我们退出前瞻时,我们知道跳过了多少行.

The recursive structure lives in a lookahead, and is optional. Its job is to balance lines that don't contain pig, on the left, with numbers, on the right: think of it as balancing a nested construct like {{{ }}}... Except that on the left we have the no-match lines, and on the right we have the numbers. The point is that when we exit the lookahead, we know how many lines were skipped.

搜索:

(?sm)(?=.*?pig)(?=((?:^(?:(?!pig)[^
])*(?:
?
))(?:(?1)|[^:]+)(:d+))?).*?Kpig(?=.*?(?(2)2):(d+))

带注释的自由间距版本:

(?xsm)             # free-spacing mode, multi-line
(?=.*?pig)        # fail right away if pig isn't there

(?=               # The Recursive Structure Lives In This Lookahead
(                 # Group 1
   (?:               # skip one line 
      ^              
      (?:(?!pig)[^
])*  # zero or more chars not followed by pig
      (?:
?
)      # newline chars
    ) 
    (?:(?1)|[^:]+)   # recurse Group 1 OR match all chars that are not a :
    (:d+)           # match digits
)?                 # End Group 
)                 # End lookahead. 
.*?Kpig                # get to pig
(?=.*?(?(2)2):(d+))   # Lookahead: capture the next digits

替换: 3

演示中,查看底部的替换.您可以使用前两行的字母(删除一个空格使 pig)将第一次出现的 pig 移动到不同的行,看看这如何影响结果.

In the demo, see the substitutions at the bottom. You can play with the letters on the first two lines (delete a space to make pig) to move the first occurrence of pig to a different line, and see how that affects the results.

支持的语言:除了上面提到的文本编辑器(Notepad++ 和 EditPad Pro),这个解决方案应该适用于使用 PCRE(PHP、R、Delphi)、Perl 和 Python 的语言,使用 Matthew Barnett 的 regex 模块(未经测试).通过将 K 转换为前瞻并将所有格量词转换为原子组,该解决方案很容易适应 .NET(请参阅下面几行的 .NET 版本.)

Supported languages: Apart from the text editors mentioned above (Notepad++ and EditPad Pro), this solution should work in languages that use PCRE (PHP, R, Delphi), in Perl, and in Python using Matthew Barnett's regex module (untested). The solution is easy to adapt to .NET by converting the K to a lookahead and the possessive quantifier to an atomic group (see the .NET Version a few lines below.)

搜索:

(?sm)(?=.*?pig)(?:(?:^(?:(?!pig)[^
])*(?:
?
))(?=[^:]+((?(1)1):d+)))*+.*?Kpig(?=[^:]+(?(1)1):(d+))

.NET 版本:回到未来

.NET 没有 K.它的位置,我们使用回到未来"的lookbehind(包含在匹配之前跳过的lookahead的lookbehind).此外,我们需要使用原子组而不是所有格量词.

.NET does not have K. It its place, we use a "back to the future" lookbehind (a lookbehind that contains a lookahead that skips ahead of the match). Also, we need to use an atomic group instead of a possessive quantifier.

(?sm)(?<=(?=.*?pig)(?=(?>(?:^(?:(?!pig)[^
])*(?:
?
))(?=[^:]+((?(1)1):d+)))*).*)pig(?=[^:]+(?(1)1):(d+))

带注释的自由间距版本(Perl/PCRE 版本):

(?xsm)             # free-spacing mode, multi-line
(?=.*?pig)        # lookahead: if pig is not there, fail right away to save the effort
(?:               # start counter-line-skipper (lines that don't include pig)
   (?:               # skip one line 
      ^              # 
      (?:(?!pig)[^
])*  # zero or more chars not followed by pig
      (?:
?
)      # newline chars
    )   
   # for each line skipped, let Group 1 match an ever increasing portion of the numbers string at the bottom
   (?=             # lookahead
      [^:]+           # skip all chars that are not colons
      (               # start Group 1
        (?(1)1)      # match Group 1 if set
        :d+          # match a colon and some digits
      )               # end Group 1
   )               # end lookahead
)*+               # end counter-line-skipper: zero or more times
.*?               # match
K                # drop everything we've matched so far
pig               # match pig (this is the match!)
(?=[^:]+(?(1)1):(d+))   # capture the next number to Group 2

替换:

2

输出:

my cat
dog
my 3
my cow
my mouse

:1:2:3:4:5:6:7

演示中,查看底部的替换.您可以使用前两行的字母(删除一个空格使 pig)将第一次出现的 pig 移动到不同的行,看看这如何影响结果.

In the demo, see the substitutions at the bottom. You can play with the letters on the first two lines (delete a space to make pig) to move the first occurrence of pig to a different line, and see how that affects the results.

数字分隔符的选择

在我们的示例中,数字字符串的分隔符 : 很常见,并且可能出现在其他地方.我们可以发明一个 UNIQUE_DELIMITER 并稍微调整表达式.但是下面的优化更有效,让我们保留 :

In our example, the delimiter : for the string of digits is rather common, and could happen elsewhere. We can invent a UNIQUE_DELIMITER and tweak the expression slightly. But the following optimization is even more efficient and lets us keep the :

不是按顺序粘贴我们的数字,而是以相反的顺序使用它们可能对我们有利::7:6:5:4:3:2:1

Instead of pasting our digits in order, it may be to our benefit to use them in the reverse order: :7:6:5:4:3:2:1

在我们的前瞻中,这允许我们使用简单的 .* 深入到输入的底部,并从那里开始回溯.因为我们知道我们在字符串的末尾,所以我们不必担心 :digits 是字符串另一部分的一部分.这是操作方法.

In our lookaheads, this allows us to get down to the bottom of the input with a simple .*, and to start backtracking from there. Since we know we're at the end of the string, we don't have to worry about the :digits being part of another section of the string. Here's how to do it.

输入:

my cat pi g
dog p ig
my pig
my cow
my mouse

:7:6:5:4:3:2:1

搜索:

(?xsm)             # free-spacing mode, multi-line
(?=.*?pig)        # lookahead: if pig is not there, fail right away to save the effort
(?:               # start counter-line-skipper (lines that don't include pig)
   (?:               # skip one line that doesn't have pig
      ^              # 
      (?:(?!pig)[^
])*  # zero or more chars not followed by pig
      (?:
?
)      # newline chars
    )   
   # Group 1 matches increasing portion of the numbers string at the bottom
   (?=             # lookahead
      .*           # get to the end of the input
      (               # start Group 1
        :d+          # match a colon and some digits
        (?(1)1)      # match Group 1 if set
      )               # end Group 1
   )               # end lookahead
)*+               # end counter-line-skipper: zero or more times
.*?               # match
K                # drop match so far
pig               # match pig (this is the match!)
(?=.*(d+)(?(1)1))   # capture the next number to Group 2

替换: 2

请参阅演示中的替换.

此解决方案特定于 .NET.

This solution is specific to .NET.

搜索:

(?m)(?<=A(?<c>^(?:(?!pig)[^
])*(?:
?
))*.*?)pig(?=[^:]+(?(c)(?<-c>:d+)*):(d+))

带注释的自由间距版本:

(?xm)                # free-spacing, multi-line
(?<=                 # lookbehind
   A                # 
   (?<c>               # skip one line that doesn't have pig
                       # The length of Group c Captures will serve as a counter
     ^                    # beginning of line
     (?:(?!pig)[^
])*  # zero or more chars not followed by pig
     (?:
?
)            # newline chars
   )                   # end skipper
   *                   # repeat skipper
   .*?                 # we're on the pig line: lazily match chars before pig
   )                # end lookbehind
pig                 # match pig: this is the match
(?=                 # lookahead
   [^:]+               # get to the digits
   (?(c)               # if Group c has been set
     (?<-c>:d+)         # decrement c while we match a group of digits
     *                   # repeat: this will only repeat as long as the length of Group c captures > 0 
   )                   # end if Group c has been set
   :(d+)              # Match the next digit group, capture the digits
)                    # end lokahead

替换: $1

这篇关于正则表达式可以返回找到匹配项的行号吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆