奇怪的C#正则表达式行为 [英] Weird c# Regex behaviour
问题描述
你好,
我找不到适合我需要的有效正则表达式模式.
我有一个像这样的示例字符串:
我有四个孩子,每个孩子七岁,[七]年前我没有孩子,因为我十四岁
现在,我要匹配,然后用四个"和"[七个]"代替.
所以我使用了像这样的模式:
\ bfour \ b | \ b \ [七个\] \ b
(使用单词边界匹配精确单词的搜索.对方括号进行转义以从字面上进行匹配)
但只有四个"被匹配和替换.
如果我将模式更改为:
四个| \ [七个\]
四个"和"[七个]"都匹配.但是因为我删除了单词边界命令"\ b",所以现在可能会出现部分单词匹配的情况(例如,四个"变成十四"),而这并不是我想要的.
最终似乎"\ b"与这种奇怪的行为有关,但我不知道为什么以及如何解决.
任何帮助表示赞赏.谢谢.
Hello,
I cannot find a valid regular expression pattern for my needs.
I have a sample string like this:
I have four child of seven years each, [seven] years ago I had no child, because I was fourteen
now, I want match and then substitute the words "four" and "[seven]".
So I have used a pattern like:
\bfour\b|\b\[seven\]\b
(searches using word boundaries to match exact words. Square brackets are escaped to match them literally)
but only "four" is matched and substituted.
If I change the pattern to:
four|\[seven\]
"four" and "[seven]" are both matched. But because I have removed the word boundary command "\b", now partial word matches can happen ("four" into "fourteen", for example) and this is not what I want.
Ultimately seems that "\b" has to do with this strange behaviour but I don''t know why and how to solve.
Any help is appreciated. Thanks.
推荐答案
[七个]与世界类"的定义不匹配.尝试使用 \ bfour \ b | \ [seven \]
有关详细信息,请参见此处
我建议您下载 Expresso 并使用它
[seven] does not match definition of ''world class''. Try to use \bfour\b|\[seven\]
See here for details
I recommend you to download Expresso and play with it
让我详细说明一下关于加泰林所说的话.\w
是字符"[A-Za-z0-9_]"
的类别.单词边界只能出现在这些字符旁边.下面的代码很好地说明了这一点:
Let me elaborate a bit on what Catalin already said.\w
is the class of characters"[A-Za-z0-9_]"
. Word boundaries can occurr only right next to these characters. The code below illustrates this quite nicely:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using TestSupportService.ServiceReference;
namespace TestSupportService
{
class Program
{
static void Main(string[] args)
{
String example = "I have four child of seven years each, [seven] years ago I had no child, because I was fourteen";
Regex rexWillDo = new Regex(@"\bfour\b|\[\bseven\b\]");
Regex rexWontDo = new Regex(@"\bfour\b|\b\[seven\]\b");
Console.WriteLine("Now you see it!");
MatchCollection matches = rexWillDo.Matches(example);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
Console.WriteLine("\nAnd now you don't!");
matches = rexWontDo.Matches(example);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
Console.ReadLine();
}
}
}
因此,通过将单词边界检测器移动到(真实)单词字符旁边,表达式可以工作.我确实承认我也没有想到这种行为.正则表达式通常对我来说效果很好,但是MS偶尔会用它的实现使它变得丑陋,并咬我们. :(
干杯!
So by moving the word boundary detectors next to (real) word characters the expression works. I do admit that I also did not expect that kind of behavior. Regular expressions usually work quite nicely for me, but once in a while MS''s implementation of it rears it''s ugly head and bites us. :(
Cheers!
这篇关于奇怪的C#正则表达式行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!