如何拆分包含换行符的字符串 [英] How to split a string containing newlines

查看:100
本文介绍了如何拆分包含换行符的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个字符串(从 Outlook 电子邮件消息 body.innerText 中提取)包含嵌入的换行符.如何将其拆分为字符串数组?

A string (extracted from an Outlook email message body.innerText) contains embedded newlines. How can I split this into an array of strings?

我希望这个示例字符串被分成两 (2) 个项目的数组.相反,它变成了一个由三 (3) 个项目组成的数组,中间有一个空行.

I would expect this example string to be split into an array of two (2) items. Instead, it becomes an array of three (3) items with a blank line in the middle.

PS C:\src\t> ("This is`r`na string.".Split([Environment]::NewLine)) | % { $_ }
This is

a string.
PS C:\src\t> "This is `r`na string.".Split([Environment]::NewLine) | Out-String | Format-Hex

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   54 68 69 73 20 69 73 20 0D 0A 0D 0A 61 20 73 74  This is ....a st
00000010   72 69 6E 67 2E 0D 0A                             ring...

推荐答案

要将 CRLF 序列作为一个整体视为分隔符,使用 -split operator,基于正则表达式:

To treat a CRLF sequence as a whole as the separator, it's simpler to use the -split operator, which is regex-based:

PS> "This is `r`n`r`n a string." -split '\r?\n'
This is 
 a string.

注意:

  • \r?\n 匹配 CRLF(Windows 风格)和 LF-only(Unix 风格)换行符;如果您真的只想匹配 CRLF 序列,请使用 \r\n.

  • \r?\n matches both CRLF (Windows-style) and LF-only (Unix-style) newlines; use \r\n if you really only want to match CRLF sequences.

  • 注意单引号字符串('...')的使用,以便将包含正则表达式的字符串按原样传递给 .NET 正则表达式引擎;正则表达式引擎使用 \ 作为转义符;因此使用 \r\n.
  • Note the use of a single-quoted string ('...'), so as to pass the string containing the regex as-is through to the .NET regex engine; the regex engine uses \ as the escape character; hence the use of \r and \n.

PowerShell 的 -split operator 通常是 [string] .NET 类型的 .Split() 方法 - 请参阅这个答案.

PowerShell's -split operator is a generally superior alternative to the [string] .NET type's .Split() method - see this answer.

至于你尝试了什么:

分隔符参数 [Environment]::NewLine 在 Windows 上是字符串 "`r`n",即一个 CRLF 序列.

The separator argument, [Environment]::NewLine, on Windows is the string "`r`n", i.e. a CRLF sequence.

  • PowerShell [Core] v6+中,您的命令确实有效,因为该字符串作为一个整体被视为分隔符.

  • In PowerShell [Core] v6+, your command does work, because this string as a whole is considered the separator.

Windows PowerShell 中,正如 Steven 在他的有用回答中指出的那样,单个字符 - CR 和 LF 分别 被视为分隔符,从而在结果数组中产生一个额外的空元素 - CR 和 LF 之间的空字符串.

In Windows PowerShell, as Steven points out in his helpful answer, the individual characters - CR and LF separately are considered separators, resulting in an extra, empty element - the empty string between the CR and the LF - in the result array.

这种行为的变化发生在 PowerShell 的控制之外:.NET Core 引入了一个新的 .Split() 方法重载和一个 [string] 类型的分隔符参数,它PowerShell 的重载解析算法现在使用 [char[]] 类型的参数选择旧的重载.
避免这种不可避免的(尽管很少见)无意的行为变化是更喜欢 PowerShell 原生 -split operator 而不是 .NET [string] 类型的 .Split() method.

This change in behavior happened outside of PowerShell's control: .NET Core introduced a new .Split() method overload with a [string]-typed separator parameter, which PowerShell's overload-resolution algorithm now selects over the older overload with the [char[]]-typed parameter.
Avoiding such unavoidable (albeit rare) inadvertent behavioral changes is another good reason to prefer the PowerShell-native -split operator over the .NET [string] type's .Split() method.

这篇关于如何拆分包含换行符的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆