C#使用正则表达式删除行,包括换行符 [英] C# remove line using regular expression, including line break

查看:123
本文介绍了C#使用正则表达式删除行,包括换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从某些文本中删除与特定模式匹配的行.一种方法是使用带有开始/结束锚点的正则表达式,如下所示:

I need to remove lines that match a particular pattern from some text. One way to do this is to use a regular expression with the begin/end anchors, like so:

var re = new Regex("^pattern$", RegexOptions.Multiline);
string final = re.Replace(initial, "");

这很好,除了它留空行而不是删除整行(包括换行符).

This works fine except that it leaves an empty line instead of removing the entire line (including the line break).

为解决这个问题,我为换行符添加了一个可选的捕获组,但是我想确保它包括所有不同的换行符样式,所以我这样做是这样的:

To solve this, I added an optional capturing group for the line break, but I want to be sure it includes all of the different flavors of line breaks, so I did it like so:

var re = new Regex(@"^pattern$(\r\n|\r|\n)?", RegexOptions.Multiline);
string final = re.Replace(initial, "");

这可行,但是似乎应该有一种更直接的方法来做到这一点.是否有一种更简单的方法来可靠地删除整个行,包括结束换行符(如果有的话)?

This works, but it seems like there should be a more straightforward way to do this. Is there a simpler way to reliably remove the entire line including the ending line break (if any)?

推荐答案

要匹配任何单个换行符,可以使用(?:\ r \ n | [\ r \ n \ u000B \ u000C \ u0085 \u2028 \ u2029])模式.因此,您可以使用(?:\ r \ n | [\ r \ n \ u000B \ u000C \ u0085)代替(\ r \ n | \ r | \ n)?\ u2028 \ u2029])?.

To match any single line break sequence you may use (?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029]) pattern. So, instead of (\r\n|\r|\n)?, you can use (?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029])?.

详细信息:

  • 000A -换行符 \ n
  • 000B -线制表符
  • 000C -一个换页符
  • 000D -回车符, \ r
  • 0085 -下一行字符, NEL
  • 2028 -行分隔符char- 2029 -段落分隔符.
  • ‎000A - a newline, \n
  • ‎000B - a line tabulation char
  • ‎000C - a form feed char
  • ‎000D - a carriage return, \r
  • ‎0085 - a next line char, NEL
  • ‎2028 - a line separator char ‎- 2029 - a paragraph separator char.

如果要在匹配的行后删除任何0+个非水平(或垂直)空白字符,可以使用 [\ s-[\ p {Zs} \ t]] * :任何空格( \ s ),但(-[...] )水平空格(与 [\ p {Zs} \ t]匹配)代码>).请注意,由于某些原因, \ p {Zs} Unicode类别类与制表符不匹配.

If you want to remove any 0+ non-horizontal (or vertical) whitespace chars after a matched line, you may use [\s-[\p{Zs}\t]]*: any whitespace (\s) but (-[...]) a horizontal whitespace (matched with [\p{Zs}\t]). Note that for some reason, \p{Zs} Unicode category class does not match tab chars.

由于您正在使用 RegexOptions.Multiline 选项,因此必须在另一方面处理:它使 $ 匹配换行符( \ n )或字符串结尾.这就是为什么如果您的行尾是CRLF,则模式可能无法匹配.因此,在您的模式的 $ 之前添加一个可选的 \ r?.

One more aspect must be dealt with here since you are using the RegexOptions.Multiline option: it makes $ match before a newline (\n) or end of string. That is why if your line endings are CRLF the pattern may fail to match. Hence, add an optional \r? before $ in your pattern.

所以,要么使用

@"^pattern\r?$(?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029])?"

@"^pattern\r?$[\s-[\p{Zs}\t]]*"

这篇关于C#使用正则表达式删除行,包括换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆