查找具有附加资格标准的所有比赛 [英] Find all matches with an extra qualifying criteria
问题描述
给出诸如以下的句子
Boy has a dog and a cat.
Boy microwaves a gerbil.
Sally owns a cat.
对于每个句子,我想要一个动物列表(定义为狗",猫"或沙鼠"),其中男孩"是第一个单词.对于上面的列表,应该是
For each sentence I want a list of animals (defined as 'dog' 'cat' or 'gerbil') where "Boy" is the first word. For the list above that would be;
['dog', 'cat']
['gerbil']
3rd sentence would not match.
正则表达式;
dog|cat|gerbil
将返回所有匹配项,但不特定于男孩(第三句话将返回不受欢迎的猫").
Will return all matches, but not specific to boy (the third sentence would return an undesirable 'cat').
^Boy.*(dog|cat|gerbil)
返回直到最后一个匹配的动物,例如男孩有一个狗和猫",而第一个也是唯一的组是猫".
Returns the entire phrase up to the last matching animal, such as "Boy has a dog and a cat", whereas the first and only group is "cat".
如何获取与男孩"相关的所有动物的列表(即,以男孩"开头的句子中的动物)?
How do I get the list of all animals associated with "Boy" (that is, animals in sentences starting with "Boy")?
推荐答案
您可以在后面使用肯定的表情:
You may use a positive lookbehind:
(?<=^Boy.*?)(?:dog|cat|gerbil)
或者,使用带有单词边界的变体来匹配动物作为整个单词:
Or, a variation with word boundaries to match the animals as whole words:
(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b
请参见正向后看的(?< = ^ Boy.*?)
将要求字符串开头的 Boy
与消费模式匹配.
The (?<=^Boy.*?)
positive lookbehind will require the Boy
at the start of the string for the consuming pattern to match.
如果您输入的内容包含LF(换行符)字符,请为.
传递 RegexOptions.Singleline
选项以匹配换行符.
If your input contains LF (newline) chars, pass the RegexOptions.Singleline
option for .
to match newlines, too.
C#用法:
var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
C#演示:
var strs = new List<string>() { "Boy has a dog and a cat.",
"Boy something a gerbil.",
"Sally owns a cat." };
foreach (var s in strs)
{
var results = Regex.Matches(s, @"(?<=^Boy\b.*?)\b(?:dog|cat|gerbil)\b")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
if (results.Count > 0) {
Console.WriteLine("{0}:\n[{1}]\n------", s, string.Join(", ", results));
}
else
{
Console.WriteLine("{0}:\nNO MATCH!\n------", s);
}
}
输出:
Boy has a dog and a cat.:
[dog, cat]
------
Boy something a gerbil.:
[gerbil]
------
Sally owns a cat.:
NO MATCH!
------
还有另一种选择:匹配任何以 Boy
开头的字符串,然后仅在每次成功匹配后进行匹配:
There is an alternative: match any string starting with Boy
and then after each successful match only:
(?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b
请参见regex101链接此处)
您只需要获取第1组的内容:
You would just need to grab Group 1 contents:
var results = Regex.Matches(s, @"(?:\G(?!\A)|^Boy\b).*?\b(dog|cat|gerbil)\b")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
请参见此C#演示.
在这里
-
(?:\ G(?!\ A)| ^ Boy \ b)
-前一场比赛的结尾(\ G(?!\ A)
)或字符串的开头,后跟整个单词Boy
-
.*?
-除换行符外的任何0+个字符(如果没有将RegexOptions.Singleline
传递给Regex
构造函数)尽可能少 -
\ b(dog | cat | gerbil)\ b
-整个单词dog
,cat
或gerbil
(?:\G(?!\A)|^Boy\b)
- either the end of the precvious match (\G(?!\A)
) or the start of the string followed with the whole wordBoy
.*?
- any 0+ chars other than a newline (if noRegexOptions.Singleline
is passed to theRegex
constructor) as few as possible\b(dog|cat|gerbil)\b
- a whole worddog
,cat
orgerbil
基本上,这些正则表达式是相似的,尽管基于 \ G
的正则表达式可能会更快一些.
Bascially, these regexps are similar, although \G
based regex might turn out a bit faster.
这篇关于查找具有附加资格标准的所有比赛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!