正则表达式包括结果中应该是非捕获组 [英] Regex including what is supposed to be non-capturing group in result

查看:114
本文介绍了正则表达式包括结果中应该是非捕获组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下简单测试,在这里我试图获取Regex模式,以使其抽取不带后缀".exe"的可执行文件名称.
 
看来我的非捕获组设置(?:\\.exe)不起作用,或者我误解了它的工作原理.
 
regex101 regexstorm.net 显示相同的结果,而前者确认(?:\.exe)"是非捕获性匹配.
 
对我在做什么错有任何想法吗?

I have the following simple test where i'm trying to get the Regex pattern such that it yanks the executable name without the ".exe" suffix.
 
It appears my non-capturing group setting (?:\\.exe) isn't working or i'm misunderstanding how its intended to work.
 
Both regex101 and regexstorm.net show the same result and the former confirms that "(?:\.exe)" is a non-capturing match.
 
Any thoughts on what i'm doing wrong?

// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(testEcl, @"[^\\]+(?:\.exe)", RegexOptions.IgnoreCase).Value;
// expecting "MyApp" but I get "MyApp.exe"

我已经能够通过使用定义了组名的匹配模式来提取我想要的值,如下所示,但是我想了解为什么非捕获组设置方法无法按我期望的方式工作

I have been able to extract the value i wanted by using a matching pattern with group names defined, as shown in the following, but would like to understand why non-capturing group setting approach didn't work the way i expected it to.

// test variable for what i would otherwise acquire from Environment.CommandLine
var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?"
var asmName = Regex.Match(Environment.CommandLine, @"(?<fname>[^\\]+)(?<ext>\.exe)", 
    RegexOptions.IgnoreCase).Groups["fname"].Value;
// get the desired "MyApp" result

/eoq

推荐答案

(?:...)是一个非捕获组,可匹配并且仍使用文本.这意味着该组匹配的文本部分仍会添加到总体匹配值中.

A (?:...) is a non-capturing group that matches and still consumes the text. It means the part of text this group matches is still added to the overall match value.

通常,如果您想匹配某件商品但不消费,则需要使用环顾四周.因此,如果需要匹配后跟特定字符串的内容,请使用正向超前(?=...)构造:

In general, if you want to match something but not consume, you need to use lookarounds. So, if you need to match something that is followed with a specific string, use a positive lookahead, (?=...) construct:

some_pattern(?=specific string) // if specific string comes immmediately after pattern
some_pattern(?=.*specific string) // if specific string comes anywhere after pattern

如果您需要匹配,但之前需要排除匹配"某些特定文本,请使用正向查找:

If you need to match but "exclude from match" some specific text before, use a positive lookbehind:

(?<=specific string)some_pattern // if specific string comes immmediately before pattern
(?<=specific string.*?)some_pattern // if specific string comes anywhere before pattern

请注意,.*?.*-即具有*+?{2,}甚至是{1,3}量词的模式-正则表达式引擎并不总是支持后向模式但是,C#.NET正则表达式引擎幸运地支持它们. Python PyPi regex模块,Vim,JGSoft软件以及现在符合ECMAScript 2018的JavaScript环境也支持它们.

Note that .*? or .* - that is, patterns with *, +, ?, {2,} or even {1,3} quantifiers - in lookbehind patterns are not always supported by regex engines, however, C# .NET regex engine luckily supports them. They are also supported by Python PyPi regex module, Vim, JGSoft software and now by ECMAScript 2018 compliant JavaScript environments.

在这种情况下,您可以捕获需要获取的内容,而仅匹配上下文而不捕获:

In this case, you may capture what you need to get and just match the context without capturing:

var testEcl = "\"D:\\src\\repos\\myprj\\bin\\Debug\\MyApp.exe\" /?";
var asmName = string.Empty; 
var m = Regex.Match(testEcl, @"([^\\]+)\.exe", RegexOptions.IgnoreCase);
if (m.Success)
{
    asmName = m.Groups[1].Value;
}
Console.WriteLine(asmName);

请参见 C#演示

详细信息

  • ([^\\]+)-捕获组1 :\
  • 以外的一个或多个字符
  • \.-文字点
  • exe-文字exe子字符串.
  • ([^\\]+) - Capturing group 1: one or more chars other than \
  • \. - a literal dot
  • exe - a literal exe substring.

由于我们只对捕获第1组的内容感兴趣,因此我们获取m.Groups[1].Value,而不是整个m.Value(包含.exe).

Since we are only interested in capturing group #1 contents, we grab m.Groups[1].Value, and not the whole m.Value (that contains .exe).

这篇关于正则表达式包括结果中应该是非捕获组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆