为什么选择的顺序在正则表达式中很重要? [英] Why does the order of alternatives matter in regex?
问题描述
代码
using System;
using System.Text.RegularExpressions;
namespace RegexNoMatch {
class Program {
static void Main () {
string input = "a foobar& b";
string regex1 = "(foobar|foo)&?";
string regex2 = "(foo|foobar)&?";
string replace = "$1";
Console.WriteLine(Regex.Replace(input, regex1, replace));
Console.WriteLine(Regex.Replace(input, regex2, replace));
Console.ReadKey();
}
}
}
预期输出
a foobar b
a foobar b
实际输出
a foobar b
a foobar& b
问题
当正则表达式模式中的 foo和 foobar的顺序更改时,为什么替换不起作用?
Why does replacing not work when the order of "foo" and "foobar" in regex pattern is changed? How to fix this?
推荐答案
正则表达式引擎尝试按指定的顺序匹配替代项。因此,当模式为(foo | foobar)&?
时,它将立即匹配 foo
并继续尝试查找匹配项。输入字符串的下一位是 bar&。 b
不能匹配。
The regular expression engine tries to match the alternatives in the order in which they are specified. So when the pattern is (foo|foobar)&?
it matches foo
immediately and continues trying to find matches. The next bit of the input string is bar& b
which cannot be matched.
换句话说,因为 foo
是其中的一部分 foobar
,(foo | foobar)
不可能匹配 foobar
,因为它总是总是先匹配 foo
。
In other words, because foo
is part of foobar
, there is no way (foo|foobar)
will ever match foobar
, since it will always match foo
first.
有时,这可能是一个非常有用的技巧,实际上。模式(o | a |(\w))
将允许您捕获 \w
和 a
或 o
有所不同:
Occasionally, this can be a very useful trick, actually. The pattern (o|a|(\w))
will allow you to capture \w
and a
or o
differently:
Regex.Replace("a foobar& b", "(o|a|(\\w))", "$2") // fbr& b
这篇关于为什么选择的顺序在正则表达式中很重要?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!