Java正则表达式：当事先不知道捕获组的编号时，如何在特定上下文中向后引用捕获组 [英] Java regex: how to back-reference capturing groups in a certain context when their number is not known in advance

查看：103 发布时间：2020/6/7 19:18:14 java regex capturing-group

本文介绍了Java正则表达式：当事先不知道捕获组的编号时，如何在特定上下文中向后引用捕获组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

作为介绍性注释，我知道有关解决regex问题的古老说法，也了解使用RegEx处理XML的预防措施。但是请耐心片刻...

As an introductory note, I am aware of the old saying about solving problems with regex and I am also aware about the precautions on processing XML with RegEx. But please bear with me for a moment...

我正在尝试进行RegEx搜索并替换一组字符。我事先不知道该组的匹配频率，但是我只想在特定的上下文中进行搜索。

I am trying to do a RegEx search and replace on a group of characters. I don't know in advance how often this group will be matched, but I want to search with a certain context only.

示例：
如果我有以下字符串 ** ab ** df ** ab ** sdf ** ab ** fdsa ** ab ** bb ，我想搜索 ab 并替换为 @ ab @ ，使用以下正则表达式即可正常工作：

An example: If I have the following string "**ab**df**ab**sdf**ab**fdsa**ab**bb" and I want to search for "ab" and replace with "@ab@", this works fine using the following regex:

搜索正则表达式：

(.*?)(ab)(.*?)

替换：

$1@$2@$3

我一共得到了四场比赛。在每个匹配项中，组ID相同，因此反向引用（$ 1，$ 2 ...）也可以正常工作。

I get four matches in total, as expected. Within each match, the group IDs are the same, so the back-references ($1, $2 ...) work fine, too.

但是，如果我现在在字符串中添加特定上下文，则上述正则表达式将失败：

However, if I now add a certain context to the string, the regex above fails:

搜索字符串：

<context>abdfabsdfabfdsaabbb</context>

搜索正则表达式：

<context>(.*?)(ab)(.*?)</context>

这只会找到第一个匹配项。
但是，即使我在原始正则表达式中添加了一个非捕获组，也无法使用（< context>（？：（。*？）（ab）（。* ？））*< / context> ）。

This will find only the first match. But even if I add a non-capturing group to the original regex, it doesn't work ("<context>(?:(.*?)(ab)(.*?))*</context>").

我想要的是与第一次搜索中一样的匹配项列表（无上下文），因此在每个匹配项中，组ID都是相同的。

What I would like is a list of matches as in the first search (without the context), whereby within each match the group IDs are the same.

是否知道如何实现？

解决方案

您的要求类似于这个问题：匹配并捕获前缀和后缀之间的模式的多个实例。使用我的答案中所述的方法：

Solution

Your requirement is similar to the one in this question: match and capture multiple instances of a pattern between a prefix and a suffix. Using the method as described in this answer of mine:

(?s)(?:<context>|(?!^)\G)(?:(?!</context>|ab).)*ab

根据需要添加捕获组。

请注意，正则表达式仅适用于只允许包含文本的标记。如果标签包含其他标签，则它将无法正常工作。

Note that the regex only works for tags that are only allowed to contain only text. If a tag contains other tags, then it won't work correctly.

它还与<$内的 ab 相匹配c $ c>< context> 标签，而没有结束标签< / context> 。如果要防止这种情况发生，则：

It also matches ab inside <context> tag without a closing tag </context>. If you want to prevent this then:

(?s)(?:<context>(?=.*?</context>)|(?!^)\G)(?:(?!</context>|ab).)*ab

说明

让我们分解正则表达式：

Explanation

Let us break down the regex:

(?s)                        # Make . matches any character, without exception
(?:
  <context>
    |
  (?!^)\G
)
(?:(?!</context>|ab).)*
ab

（？：< context> |（？！^）\G）确保我们进入新的< context> 标记，或者从上一个匹配继续并尝试匹配更多子模式实例。

(?:<context>|(?!^)\G) makes sure that we either gets inside a new <context> tag, or continue from the previous match and attempt to match more instance of sub-pattern.

（？:( ?! < / context> | ab）。）* 匹配我们不关心的任何文本（不是 ab ），并阻止我们前进结束标记< / context> 之后。然后，我们匹配希望结尾的 ab 的模式。

(?:(?!</context>|ab).)* match whatever text that we don't care about (not ab) and prevent us from going past the closing tag </context>. Then we match the pattern we want ab at the end.

这篇关于Java正则表达式：当事先不知道捕获组的编号时，如何在特定上下文中向后引用捕获组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Java正则表达式：当事先不知道捕获组的编号时，如何在特定上下文中向后引用捕获组 [英] Java regex: how to back-reference capturing groups in a certain context when their number is not known in advance

问题描述

推荐答案

解决方案

Solution

说明

Explanation

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

Java正则表达式：当事先不知道捕获组的编号时，如何在特定上下文中向后引用捕获组 [英] Java regex: how to back-reference capturing groups in a certain context when their number is not known in advance

问题描述

推荐答案

解决方案

Solution

说明

Explanation

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭