需要正则表达式来匹配特殊情况 [英] Need Regex for to match special situations
问题描述
我正在拼命寻找与这些场景匹配的正则表达式:
I'm desperately searching for regular expressions that match these scenarios:
1) 匹配交替字符
我有一个像这是我的 foobababababaf 字符串"这样的字符串 - 我想匹配babababa"
I've a string like "This is my foobababababaf string" - and I want to match "babababa"
我唯一知道的是要搜索的片段的长度 - 我不知道可能是什么字符/数字 - 但它们是交替的.
Only thing I know is the length of the fragment to search - I don't know what chars/digits that might be - but they are alternating.
我真的不知道从哪里开始:(
I've really no clue where to start :(
2) 匹配组合组
在像这是我的 foobaafoobaaaooo 字符串"这样的字符串中 - 我想匹配aaaooo".就像 1) 我不知道可能是什么字符/数字.我只知道他们会分两组出现.
In a string like "This is my foobaafoobaaaooo string" - and I want to match "aaaooo". Like in 1) I don't know what chars/digits that might be. I only know that they will appear in two groups.
我尝试使用 (.)\1\1\1(.)\1\1\1 之类的东西...
I experimented using (.)\1\1\1(.)\1\1\1 and things like this...
推荐答案
我认为这样的事情正是您想要的.
I think something like this is what you want.
对于交替字符:
(?=(.)(?!\1)(.))(?:\1\2){2,}
\0
将是整个交替序列,\1
和 \2
是两个(不同的)交替字符.
\0
will be the entire alternating sequence, \1
and \2
are the two (distinct) alternating characters.
对于 N 和 M 个字符的运行,可能由其他字符分隔(在此处将 N
和 M
替换为数字):
For run of N and M characters, possibly separated by other characters (replace N
and M
with numbers here):
(?=(.))\1{N}.*?(?=(?!\1)(.))\2{M}
\0
将是完整的匹配,包括中缀.\1
是重复(至少)N
次的字符,\2
是重复(至少)M次的字符代码> 次.
\0
will be entire match, including infix. \1
is the character repeated (at least) N
times, \2
is the character repeated (at least) M
times.
这是 Java 中的测试工具.
Here's a test harness in Java.
import java.util.regex.*;
public class Regex3 {
static String runNrunM(int N, int M) {
return "(?=(.))\\1{N}.*?(?=(?!\\1)(.))\\2{M}"
.replace("N", String.valueOf(N))
.replace("M", String.valueOf(M));
}
static void dumpMatches(String text, String pattern) {
Matcher m = Pattern.compile(pattern).matcher(text);
System.out.println(text + " <- " + pattern);
while (m.find()) {
System.out.println(" match");
for (int g = 0; g <= m.groupCount(); g++) {
System.out.format(" %d: [%s]%n", g, m.group(g));
}
}
}
public static void main(String[] args) {
String[] tests = {
"foobababababaf foobaafoobaaaooo",
"xxyyyy axxayyyya zzzzzzzzzzzzzz"
};
for (String test : tests) {
dumpMatches(test, "(?=(.)(?!\\1)(.))(?:\\1\\2){2,}");
}
for (String test : tests) {
dumpMatches(test, runNrunM(3, 3));
}
for (String test : tests) {
dumpMatches(test, runNrunM(2, 4));
}
}
}
这会产生以下输出:
foobababababaf foobaafoobaaaooo <- (?=(.)(?!\1)(.))(?:\1\2){2,}
match
0: [bababababa]
1: [b]
2: [a]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.)(?!\1)(.))(?:\1\2){2,}
foobababababaf foobaafoobaaaooo <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
match
0: [aaaooo]
1: [a]
2: [o]
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
match
0: [yyyy axxayyyya zzz]
1: [y]
2: [z]
foobababababaf foobaafoobaaaooo <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
match
0: [xxyyyy]
1: [x]
2: [y]
match
0: [xxayyyy]
1: [x]
2: [y]
<小时>
说明
(?=(.)(?!\1)(.))(?:\1\2){2,}
有两部分(?=(.)(?!\1)(.))
使用前瞻建立\1
和\2
- 嵌套负前瞻确保
\1
!=\2
- 使用前瞻捕获让
\0
拥有整个匹配项(而不仅仅是尾"端) (?=(.)(?!\1)(.))(?:\1\2){2,}
has two parts(?=(.)(?!\1)(.))
establishes\1
and\2
using lookahead- Nested negative lookahead ensures that
\1
!=\2
- Using lookahead to capture lets
\0
have the entire match (instead of just the "tail" end) (?=(.))\1{N}
在先行中捕获\1
,然后将其匹配N
次- 使用前瞻捕获意味着重复可以是
N
而不是N-1
(?=(.))\1{N}
captures\1
in a lookahead, and then match itN
times- Using lookahead to capture means the repetition can be
N
instead ofN-1
- 类似于第一部分
- 嵌套负前瞻确保
\1
!=\2
运行正则表达式将匹配更长的运行,例如
run(2,2)
匹配"xxxyyy"
:The run regex will match longer runs, e.g.
run(2,2)
matches"xxxyyy"
:xxxyyy <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{2} match 0: [xxxyy] 1: [x] 2: [y]
此外,它不允许重叠匹配.即
"xx11yyy222"
中只有一个run(2,3)
.Also, it does not allow overlapping matches. That is, there is only one
run(2,3)
in"xx11yyy222"
.xx11yyy222 <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{3} match 0: [xx11yyy] 1: [x] 2: [y]
这篇关于需要正则表达式来匹配特殊情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- Using lookahead to capture means the repetition can be
- 使用前瞻捕获意味着重复可以是
- Nested negative lookahead ensures that
Explanation
- 嵌套负前瞻确保