C#使用正则表达式删除行,包括换行符 [英] C# remove line using regular expression, including line break
问题描述
我需要从某些文本中删除与特定模式匹配的行.一种方法是使用带有开始/结束锚点的正则表达式,如下所示:
I need to remove lines that match a particular pattern from some text. One way to do this is to use a regular expression with the begin/end anchors, like so:
var re = new Regex("^pattern$", RegexOptions.Multiline);
string final = re.Replace(initial, "");
这很好,除了它留空行而不是删除整行(包括换行符).
This works fine except that it leaves an empty line instead of removing the entire line (including the line break).
为解决这个问题,我为换行符添加了一个可选的捕获组,但是我想确保它包括所有不同的换行符样式,所以我这样做是这样的:
To solve this, I added an optional capturing group for the line break, but I want to be sure it includes all of the different flavors of line breaks, so I did it like so:
var re = new Regex(@"^pattern$(\r\n|\r|\n)?", RegexOptions.Multiline);
string final = re.Replace(initial, "");
这可行,但是似乎应该有一种更直接的方法来做到这一点.是否有一种更简单的方法来可靠地删除整个行,包括结束换行符(如果有的话)?
This works, but it seems like there should be a more straightforward way to do this. Is there a simpler way to reliably remove the entire line including the ending line break (if any)?
推荐答案
要匹配任何单个换行符,可以使用(?:\ r \ n | [\ r \ n \ u000B \ u000C \ u0085 \u2028 \ u2029])
模式.因此,您可以使用(?:\ r \ n | [\ r \ n \ u000B \ u000C \ u0085)代替
.(\ r \ n | \ r | \ n)?
\ u2028 \ u2029])?
To match any single line break sequence you may use (?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029])
pattern. So, instead of (\r\n|\r|\n)?
, you can use (?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029])?
.
详细信息:
-
000A
-换行符\ n
-
000B
-线制表符 -
000C
-一个换页符 -
000D
-回车符,\ r
-
0085
-下一行字符,NEL
-
2028
-行分隔符char-2029
-段落分隔符.
000A
- a newline,\n
000B
- a line tabulation char000C
- a form feed char000D
- a carriage return,\r
0085
- a next line char,NEL
2028
- a line separator char -2029
- a paragraph separator char.
如果要在匹配的行后删除任何0+个非水平(或垂直)空白字符,可以使用 [\ s-[\ p {Zs} \ t]] *
:任何空格( \ s
),但(-[...]
)水平空格(与 [\ p {Zs} \ t]匹配)代码>).请注意,由于某些原因,
\ p {Zs}
Unicode类别类与制表符不匹配.
If you want to remove any 0+ non-horizontal (or vertical) whitespace chars after a matched line, you may use [\s-[\p{Zs}\t]]*
: any whitespace (\s
) but (-[...]
) a horizontal whitespace (matched with [\p{Zs}\t]
). Note that for some reason, \p{Zs}
Unicode category class does not match tab chars.
由于您正在使用 RegexOptions.Multiline
选项,因此必须在另一方面处理:它使 $
匹配换行符( \ n
)或字符串结尾.这就是为什么如果您的行尾是CRLF,则模式可能无法匹配.因此,在您的模式的 $
之前添加一个可选的 \ r?
.
One more aspect must be dealt with here since you are using the RegexOptions.Multiline
option: it makes $
match before a newline (\n
) or end of string. That is why if your line endings are CRLF the pattern may fail to match. Hence, add an optional \r?
before $
in your pattern.
所以,要么使用
@"^pattern\r?$(?:\r\n|[\r\n\u000B\u000C\u0085\u2028\u2029])?"
或
@"^pattern\r?$[\s-[\p{Zs}\t]]*"
这篇关于C#使用正则表达式删除行,包括换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!