需要正则表达式来匹配特殊情况 [英] Need Regex for to match special situations

查看：78 发布时间：2021/6/14 20:46:32 php regex pcre

本文介绍了需要正则表达式来匹配特殊情况的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在拼命寻找与这些场景匹配的正则表达式:

I'm desperately searching for regular expressions that match these scenarios:

1) 匹配交替字符

我有一个像这是我的 foobababababaf 字符串"这样的字符串 - 我想匹配babababa"

I've a string like "This is my foobababababaf string" - and I want to match "babababa"

我唯一知道的是要搜索的片段的长度 - 我不知道可能是什么字符/数字 - 但它们是交替的.

Only thing I know is the length of the fragment to search - I don't know what chars/digits that might be - but they are alternating.

我真的不知道从哪里开始:(

I've really no clue where to start :(

2) 匹配组合组

在像这是我的 foobaafoobaaaooo 字符串"这样的字符串中 - 我想匹配aaaooo".就像 1) 我不知道可能是什么字符/数字.我只知道他们会分两组出现.

In a string like "This is my foobaafoobaaaooo string" - and I want to match "aaaooo". Like in 1) I don't know what chars/digits that might be. I only know that they will appear in two groups.

我尝试使用 (.)\1\1\1(.)\1\1\1 之类的东西...

I experimented using (.)\1\1\1(.)\1\1\1 and things like this...

推荐答案

我认为这样的事情正是您想要的.

I think something like this is what you want.

对于交替字符:

(?=(.)(?!\1)(.))(?:\1\2){2,}

\0 将是整个交替序列，\1 和 \2 是两个(不同的)交替字符.

\0 will be the entire alternating sequence, \1 and \2 are the two (distinct) alternating characters.

对于 N 和 M 个字符的运行，可能由其他字符分隔(在此处将 N 和 M 替换为数字):

For run of N and M characters, possibly separated by other characters (replace N and M with numbers here):

(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M}

\0 将是完整的匹配，包括中缀.\1 是重复(至少)N 次的字符，\2 是重复(至少)M 次.


\0 will be entire match, including infix. \1 is the character repeated (at least) N times, \2 is the character repeated (at least) M times.
这是 Java 中的测试工具.
Here's a test harness in Java.
import java.util.regex.*;

public class Regex3 {
    static String runNrunM(int N, int M) {
        return "(?=(.))\\1{N}.*?(?=(?!\\1)(.))\\2{M}"
            .replace("N", String.valueOf(N))
            .replace("M", String.valueOf(M));
    }
    static void dumpMatches(String text, String pattern) {
        Matcher m = Pattern.compile(pattern).matcher(text);
        System.out.println(text + " <- " + pattern);
        while (m.find()) {
            System.out.println("  match");
            for (int g = 0; g <= m.groupCount(); g++) {
                System.out.format("    %d: [%s]%n", g, m.group(g));
            }
        }
    }
    public static void main(String[] args) {
        String[] tests = {
            "foobababababaf foobaafoobaaaooo",
            "xxyyyy axxayyyya zzzzzzzzzzzzzz"
        };
        for (String test : tests) {
            dumpMatches(test, "(?=(.)(?!\\1)(.))(?:\\1\\2){2,}");
        }
        for (String test : tests) {
            dumpMatches(test, runNrunM(3, 3));
        }
        for (String test : tests) {
            dumpMatches(test, runNrunM(2, 4));
        }
    }
}

这会产生以下输出:
foobababababaf foobaafoobaaaooo <- (?=(.)(?!\1)(.))(?:\1\2){2,}
  match
    0: [bababababa]
    1: [b]
    2: [a]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.)(?!\1)(.))(?:\1\2){2,}
foobababababaf foobaafoobaaaooo <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
  match
    0: [aaaooo]
    1: [a]
    2: [o]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
  match
    0: [yyyy axxayyyya zzz]
    1: [y]
    2: [z]
foobababababaf foobaafoobaaaooo <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
  match
    0: [xxyyyy]
    1: [x]
    2: [y]
  match
    0: [xxayyyy]
    1: [x]
    2: [y]

<小时>
说明
(?=(.)(?!\1)(.))(?:\1\2){2,} 有两部分(?=(.)(?!\1)(.)) 使用前瞻建立 \1 和 \2嵌套负前瞻确保 \1 != \2
使用前瞻捕获让 \0 拥有整个匹配项(而不仅仅是尾"端)





Explanation


(?=(.)(?!\1)(.))(?:\1\2){2,} has two parts


(?=(.)(?!\1)(.)) establishes \1 and \2 using lookahead


Nested negative lookahead ensures that \1 != \2
Using lookahead to capture lets \0 have the entire match (instead of just the "tail" end)

(?=(.))\1{N} 在先行中捕获 \1，然后将其匹配 N 次使用前瞻捕获意味着重复可以是 N 而不是 N-1


(?=(.))\1{N} captures \1 in a lookahead, and then match it N times


Using lookahead to capture means the repetition can be N instead of N-1

类似于第一部分
嵌套负前瞻确保 \1 != \2
运行正则表达式将匹配更长的运行，例如run(2,2) 匹配 "xxxyyy":
The run regex will match longer runs, e.g. run(2,2) matches "xxxyyy":
xxxyyy <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{2}
  match
    0: [xxxyy]
    1: [x]
    2: [y]

此外，它不允许重叠匹配.即"xx11yyy222"中只有一个run(2,3).
Also, it does not allow overlapping matches. That is, there is only one run(2,3) in "xx11yyy222".
xx11yyy222 <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{3}
  match
    0: [xx11yyy]
    1: [x]
    2: [y]


                        这篇关于需要正则表达式来匹配特殊情况的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

需要正则表达式来匹配特殊情况 [英] Need Regex for to match special situations

问题描述

推荐答案

说明

Explanation

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

需要正则表达式来匹配特殊情况 [英] Need Regex for to match special situations

问题描述

推荐答案

说明

Explanation

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭