奇怪的C#正则表达式行为 [英] Weird c# Regex behaviour

查看:82
本文介绍了奇怪的C#正则表达式行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,

我找不到适合我需要的有效正则表达式模式.

我有一个像这样的示例字符串:

我有四个孩子,每个孩子七岁,[七]年前我没有孩子,因为我十四岁

现在,我要匹配,然后用四个"和"[七个]"代替.

所以我使用了像这样的模式:

\ bfour \ b | \ b \ [七个\] \ b

(使用单词边界匹配精确单词的搜索.对方括号进行转义以从字面上进行匹配)

但只有四个"被匹配和替换.

如果我将模式更改为:

四个| \ [七个\]


四个"和"[七个]"都匹配.但是因为我删除了单词边界命令"\ b",所以现在可能会出现部分单词匹配的情况(例如,四个"变成十四"),而这并不是我想要的.

最终似乎"\ b"与这种奇怪的行为有关,但我不知道为什么以及如何解决.

任何帮助表示赞赏.谢谢.

Hello,

I cannot find a valid regular expression pattern for my needs.

I have a sample string like this:

I have four child of seven years each, [seven] years ago I had no child, because I was fourteen

now, I want match and then substitute the words "four" and "[seven]".

So I have used a pattern like:

\bfour\b|\b\[seven\]\b

(searches using word boundaries to match exact words. Square brackets are escaped to match them literally)

but only "four" is matched and substituted.

If I change the pattern to:

four|\[seven\]


"four" and "[seven]" are both matched. But because I have removed the word boundary command "\b", now partial word matches can happen ("four" into "fourteen", for example) and this is not what I want.

Ultimately seems that "\b" has to do with this strange behaviour but I don''t know why and how to solve.

Any help is appreciated. Thanks.

推荐答案

[七个]与世界类"的定义不匹配.尝试使用 \ bfour \ b | \ [seven \]
有关详细信息,请参见此处

我建议您下载 Expresso 并使用它
[seven] does not match definition of ''world class''. Try to use \bfour\b|\[seven\]
See here for details

I recommend you to download Expresso and play with it


让我详细说明一下关于加泰林所说的话. \w是字符"[A-Za-z0-9_]"的类别.单词边界只能出现在这些字符旁边.下面的代码很好地说明了这一点:

Let me elaborate a bit on what Catalin already said. \w is the class of characters "[A-Za-z0-9_]". Word boundaries can occurr only right next to these characters. The code below illustrates this quite nicely:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

using TestSupportService.ServiceReference;

namespace TestSupportService
{
    class Program
    {
        static void Main(string[] args)
        {

            String example = "I have four child of seven years each, [seven] years ago I had no child, because I was fourteen";
            Regex rexWillDo = new Regex(@"\bfour\b|\[\bseven\b\]");
            Regex rexWontDo = new Regex(@"\bfour\b|\b\[seven\]\b");

            Console.WriteLine("Now you see it!");
            MatchCollection matches = rexWillDo.Matches(example);
            foreach (Match match in matches)
            {
                Console.WriteLine(match.Value);
            }

            Console.WriteLine("\nAnd now you don't!");
            matches = rexWontDo.Matches(example);
            foreach (Match match in matches)
            {
                Console.WriteLine(match.Value);
            }
            Console.ReadLine();

        }
    }
}



因此,通过将单词边界检测器移动到(真实)单词字符旁边,表达式可以工作.我确实承认我也没有想到这种行为.正则表达式通常对我来说效果很好,但是MS偶尔会用它的实现使它变得丑陋,并咬我们. :(

干杯!

—MRB



So by moving the word boundary detectors next to (real) word characters the expression works. I do admit that I also did not expect that kind of behavior. Regular expressions usually work quite nicely for me, but once in a while MS''s implementation of it rears it''s ugly head and bites us. :(

Cheers!

—MRB


这篇关于奇怪的C#正则表达式行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆