换行字符的.NET正则表达式解析 [英] .NET Regex parsing of the newline character

查看:104
本文介绍了换行字符的.NET正则表达式解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我面对一些问题。 在我的字符串可以有一个特殊的字符/换行符\ r \ N'

I face some problem. In my string there can be a special character / newline '\r\n'

我的正则表达式的一部分:

Part of my regex:

string sRegex = "(?<string>\"+.*\"|'+.*')";

我应该如何修改这个正则表达式来从我的字符串排除换行?

How I should modify this regex to exclude newline from my string?

感谢您的帮助。

推荐答案

你是说,你要匹配带引号的字符串,只有当他们的不要包含换行符?如果是这样,你没有做什么特别的事情,因为点默认情况下不匹配换行符。除了 + 开幕引号(这是没有意义的我)你的正则表达式应该工作正常后。但是,我第二,你的写作的正则表达式使用逐字字符串周杰伦的建议:

Are you saying you want to match quoted strings only if they don't contain newlines? If so, you don't have to do anything special because the dot doesn't match newlines by default. Aside from the + after the opening quotes (which makes no sense to me) your regex should work fine. But I second Jay's suggestion that you use verbatim string literals for writing regexes:

Regex sRegex = new Regex(@"(?<string>"".*""|'.*')");

您的做什么的需要注意的是贪婪。例如,如果有在同一行上两个字符串的声明,是这样的:

What you do need to watch out for is greediness. For example, if there are two string declarations on the same line, like this:

var s1 = "foo", s2 = "bar";

...正则表达式会找到一个匹配,富,S2 =酒吧,你希望它来搭配FOO 分开。为了避免这种情况,可以使用非贪婪量词:

...the regex will find one match, "foo", s2 = "bar", where you expected it to match "foo" and "bar" separately. To avoid that, you can use a non-greedy quantifier:

Regex sRegex = new Regex(@"(?<string>"".*?""|'.*?')");



如果您要匹配,在他们换行符的字符串,您可以使用单线选项,修改点的行为,使之能够匹配换行符。

If you do want to match strings with newlines in them, you can use the Singleline option, which modifies the behavior of the dot, enabling it to match newlines.

Regex sRegex = new Regex(@"(?<string>"".*?""|'.*?')",
                         RegexOptions.Singleline);

...或者你可以使用内联修改:

...or you can use the inline modifier:

Regex sRegex = new Regex(@"(?s)(?<string>"".*?""|'.*?')");

要知道,当你使用单线模式下的点是特别重要的是,你使用非贪婪的量词,因为潜在的匹配不再局限于单一的线。但这里的另一个替代方案,更高效以及更predictable:

Be aware that when you use the dot in singleline mode it's especially important that you use a non-greedy quantifier, since potential matches are no longer confined to a single line. But here's another alternative that's more efficient as well as more predictable:

Regex sRegex = new Regex(@"(?<string>""[^""]*""|'[^']*')");

有没有必要,因为你不使用点元字符与此正则表达式来指定单线模式。在否定字符类 [^] 除了一个引号匹配任何字符 - 包括换行

There's no need to specify singleline mode with this regex because you aren't using the dot metacharacter. The negated character class [^"] matches any character except a quotation mark--including newlines.


最后,我想说一个关于选项的话,因为似乎有很多困惑吧。人们往往认为你必须使用它时,目标文本是由多个线(即,每当它包含换行符)。这是一个自然的假设,但事实并非如此。

Finally, I'd like to say a word about the Multiline option, as there seems to be a lot of confusion about it. People tend to assume that you have to use it whenever the target text is composed of multiple lines (i.e., whenever it contains newline characters). That's a natural assumption, but it's not true.

所有多行模式下所做的是改变的开始和结束锚的行为, ^ $ 。通常情况下,他们只匹配整个字符串的开头和结尾,但是如果打开多行模式也匹配的开始和字符串中的逻辑线结束。例如,给定这样声明的字符串:

All multiline mode does is change the behavior of the start and end anchors, ^ and $. Normally they only match the beginning and end of the whole string, but if you turn on multiline mode they also match at the beginning and end of logical lines within the string. For example, given a string declared like this:

"fee fie\nfoe fum"

如果您搜索的正则表达式 ^ \ w + 在默认模式下,你会得到一个匹配:费用。但是,如果切换到多行模式下,你会得到两个:费用敌人。同样, \ w + $ 只匹配 FUM 在默认模式下,但是它匹配​​外商投资企业和多行模式 FUM 。而你总是可以匹配文字无论你在什么模式:单线,多线或默认

If you search for the regex ^\w+ in default mode you'll get one match: fee. But if you switch to multiline mode you'll get two: fee and foe. Similarly, \w+$ matches only fum in default mode, but it matches fie and fum in multiline mode. And you can always match a literal \n no matter what mode you're in: singleline, multiline or default.

人们也倾向于认为单线和多行是相互排斥的,他们都没有。我还看过的人说单线是默认的模式;也并非如此。 单线改变的点的行为(),改变锚的行为( ^ $ );仅此而已。

People also tend to assume singleline and multiline are mutually exclusive, which they aren't. I've even seen people say singleline is the default mode; also not true. Singleline changes the behavior of the dot (.), Multiline changes the behavior of the anchors (^ and $); that's all.

这篇关于换行字符的.NET正则表达式解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆