查找具有附加资格标准的所有比赛 [英] Find all matches with an extra qualifying criteria

查看:44
本文介绍了查找具有附加资格标准的所有比赛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出诸如以下的句子

Boy has a dog and a cat.
Boy microwaves a gerbil.
Sally owns a cat.

对于每个句子,我想要一个动物列表(定义为狗",猫"或沙鼠"),其中男孩"是第一个单词.对于上面的列表,应该是

For each sentence I want a list of animals (defined as 'dog' 'cat' or 'gerbil') where "Boy" is the first word. For the list above that would be;

['dog', 'cat']
['gerbil']
3rd sentence would not match.

正则表达式;

dog|cat|gerbil

将返回所有匹配项,但不特定于男孩(第三句话将返回不受欢迎的猫").

Will return all matches, but not specific to boy (the third sentence would return an undesirable 'cat').

^Boy.*(dog|cat|gerbil)

返回直到最后一个匹配的动物,例如男孩有一个狗和猫",而第一个也是唯一的组是猫".

Returns the entire phrase up to the last matching animal, such as "Boy has a dog and a cat", whereas the first and only group is "cat".

如何获取与男孩"相关的所有动物的列表(即,以男孩"开头的句子中的动物)?

How do I get the list of all animals associated with "Boy" (that is, animals in sentences starting with "Boy")?

推荐答案

您可以在后面使用肯定的表情:

You may use a positive lookbehind:

(?<=^Boy.*?)(?:dog|cat|gerbil)

或者,使用带有单词边界的变体来匹配动物作为整个单词:

Or, a variation with word boundaries to match the animals as whole words:

(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b

请参见正向后看的(?< = ^ Boy.*?)将要求字符串开头的 Boy 与消费模式匹配.

The (?<=^Boy.*?) positive lookbehind will require the Boy at the start of the string for the consuming pattern to match.

如果您输入的内容包含LF(换行符)字符,请为.传递 RegexOptions.Singleline 选项以匹配换行符.

If your input contains LF (newline) chars, pass the RegexOptions.Singleline option for . to match newlines, too.

C#用法:

var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
        .Cast<Match>()
        .Select(m => m.Value)
        .ToList();

C#演示:

var strs = new List<string>() { "Boy has a dog and a cat.", 
        "Boy something a gerbil.",
        "Sally owns a cat." };
foreach (var s in strs)
{
    var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
            .Cast<Match>()
            .Select(m => m.Value)
            .ToList();
     if (results.Count > 0) {
        Console.WriteLine("{0}:\n[{1}]\n------", s, string.Join(", ", results));
     }
     else
     {
        Console.WriteLine("{0}:\nNO MATCH!\n------", s);
     }
}

输出:

Boy has a dog and a cat.:
[dog, cat]
------
Boy something a gerbil.:
[gerbil]
------
Sally owns a cat.:
NO MATCH!
------

还有另一种选择:匹配任何以 Boy 开头的字符串,然后仅在每次成功匹配后进行匹配:

There is an alternative: match any string starting with Boy and then after each successful match only:

(?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b

请参见regex101链接此处)

您只需要获取第1组的内容:

You would just need to grab Group 1 contents:

var results = Regex.Matches(s, @"(?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b")
            .Cast<Match>()
            .Select(m => m.Groups[1].Value)
            .ToList();

请参见此C#演示.

在这里

  • (?:\ G(?!\ A)| ^ Boy \ b)-前一场比赛的结尾( \ G(?!\ A))或字符串的开头,后跟整个单词 Boy
  • .*?-除换行符外的任何0+个字符(如果没有将 RegexOptions.Singleline 传递给 Regex 构造函数)尽可能少
  • \ b(dog | cat | gerbil)\ b -整个单词 dog cat gerbil
  • (?:\G(?!\A)|^Boy\b) - either the end of the precvious match (\G(?!\A)) or the start of the string followed with the whole word Boy
  • .*? - any 0+ chars other than a newline (if no RegexOptions.Singleline is passed to the Regex constructor) as few as possible
  • \b(dog|cat|gerbil)\b - a whole word dog, cat or gerbil

基本上,这些正则表达式是相似的,尽管基于 \ G 的正则表达式可能会更快一些.

Bascially, these regexps are similar, although \G based regex might turn out a bit faster.

这篇关于查找具有附加资格标准的所有比赛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆