将 PowerShell 的默认输出编码更改为 UTF-8 [英] Changing PowerShell's default output encoding to UTF-8
问题描述
默认情况下,当您将命令的输出重定向到文件或通过管道将其传输到 PowerShell 中的其他内容时,编码为 UTF-16,这没有用.我想将其更改为 UTF-8.
可以通过将>foo.txt
语法替换为| 来逐案完成.out-file foo.txt -encoding utf8
但这很尴尬,每次都必须重复.
在 PowerShell 中设置事物的持久方法是将它们放在 UsersmeDocumentsWindowsPowerShellprofile.ps1
中;我已经验证该文件确实在启动时执行.
据说可以用 $PSDefaultParameterValues = @{'Out-File:Encoding' = 'utf8'}
设置输出编码,但我试过了,没有效果.
https://blogs.msdn.microsoft.com/powershell/2006/12/11/outputencoding-to-the-rescue/ 谈论 $OutputEncoding
乍一看似乎应该是相关的,但随后它谈到了以 ASCII 编码的输出,这并不是实际发生的事情.
如何将 PowerShell 设置为使用 UTF-8?
注意:
下一个部分主要适用于 Windows PowerShell.
- 请参阅部分之后了解跨平台PowerShell Core (v6+) 版.
在这两种情况下,信息都适用于使 PowerShell 使用 UTF-8 读取和写入文件.
- 相比之下,有关如何向外部程序发送和接收 UTF-8 编码的字符串的信息,请参阅
这个答案.
- 相比之下,有关如何向外部程序发送和接收 UTF-8 编码的字符串的信息,请参阅
在 PSv5.1 或更高版本中,其中
>
和>>
是Out 的有效别名-File
,您可以通过$PSDefaultParameterValues
偏好变量:在 PSv5.0 或以下中,您不能更改
>
/>>
,但是,在 PSv3 或更高版本上,上述技术确实适用于显式调用Out-文件
.
($PSDefaultParameterValues
首选项变量是在 PSv3.0 中引入的).在 PSv3.0 或更高版本中,如果您要为所有支持
的 cmdlet 设置默认编码-Encoding
参数(在 PSv5.1+ 中包括>
和>>>
),使用:>$PSDefaultParameterValues['*:Encoding'] = 'utf8'
如果你把这个命令放在你的 $PROFILE
中,cmdlet 如 Out-File
和 Set-Content
将默认使用 UTF-8 编码,但请注意,这使其成为会话全局设置,它将影响所有未通过以下方式明确指定编码的命令/脚本他们的 -Encoding
参数.
同样,确保在您的脚本或模块中包含您希望以相同方式运行的命令,以便它们确实即使由另一个用户或不同的机器运行,行为也相同;但是,为了避免会话全局更改,请使用以下形式创建$PSDefaultParameterValues
的本地副本:
$PSDefaultParameterValues = @{ '*:Encoding' = 'utf8' }
有关许多 Windows PowerShell 标准 cmdlet 中极度不一致的默认字符编码行为的摘要,请参阅底部部分.
自动$OutputEncoding
变量无关,仅适用于PowerShell与外部程序通信的方式(PowerShell 在向它们发送字符串时使用什么编码) - 它与输出重定向运算符和 PowerShell cmdlet 用于保存到文件的编码无关.
可选阅读:跨平台视角:PowerShell Core:
PowerShell 现在是跨平台的,通过其PowerShell Core 版本,其编码 - 明智地 - 默认为 BOM-less UTF-8,符合类 Unix 平台.
这意味着没有 BOM 的源代码文件被假定为 UTF-8,并使用
>
/Out-File
/Set-Content
默认为 BOM-less UTF-8;显式使用utf8
-Encoding
参数也可以创建 BOM-less UTF-8,但您可以选择创建文件 带有utf8bom
值的伪 BOM.如果您在类 Unix 平台上使用编辑器创建 PowerShell 脚本,现在甚至在 Windows 上使用 Visual Studio Code 和 Sublime Text 等跨平台编辑器,生成的
*.ps1
文件通常不具有 UTF-8 伪 BOM:- 这在 PowerShell Core 上运行良好.
- 如果文件包含非 ASCII 字符,它可能会在 Windows PowerShell 上中断;如果您确实需要在脚本中使用非 ASCII 字符,请将它们保存为 UTF-8 with BOM.
如果没有 BOM,Windows PowerShell (mis) 会将您的脚本解释为以旧版ANSI"格式进行编码.代码页(由 Unicode 之前的应用程序的系统区域设置确定;例如,美式英语系统上的 Windows-1252).
相反,确实具有 UTF-8 伪 BOM 的文件在类 Unix 平台上可能会出现问题,因为它们会导致诸如
cat
之类的 Unix 实用程序、sed
和awk
- 甚至一些编辑器,例如gedit
- 以通过传递伪 BOM,即,将其视为数据.- 这可能不是总是的问题,但肯定会成为问题,例如当您尝试使用
bash
将文件读入字符串时,例如text=$(cat file)
或text=$(<file)
- 结果变量将包含伪 BOM 作为前 3 个字节.
- 这可能不是总是的问题,但肯定会成为问题,例如当您尝试使用
Windows PowerShell 中的默认编码行为不一致:
遗憾的是,Windows PowerShell 中使用的默认字符编码非常不一致;跨平台 PowerShell Core 版本,如上一节所述,令人称道地结束了这一点.
注意:
以下内容并未涵盖所有标准 cmdlet.
谷歌搜索 cmdlet 名称以查找其帮助主题现在默认显示主题的 PowerShell Core 版本;使用左侧主题列表上方的版本下拉列表切换到 Windows PowerShell 版本.
在撰写本文时,该文档经常错误地声称 ASCII 是 Windows PowerShell 中的默认编码 - 请参阅 这个 GitHub 文档问题.
Cmdlet 编写:
Out-File
和 >
/>>>
创建Unicode"- UTF-16LE - 默认文件 - 其中每个 ASCII 范围字符(太) 由 2 个字节表示 - 这与 Set-Content
/Add-Content
明显不同(见下一点);New-ModuleManifest
和 Export-CliXml
也创建 UTF-16LE 文件.
Set-Content
(和 Add-Content
如果文件尚不存在/为空)使用 ANSI 编码(由活动系统区域设置的 ANSI 指定的编码)旧代码页,PowerShell 将其称为 Default
).
Export-Csv
确实创建了 ASCII 文件,如文档所示,但请参阅下面的 -Append
注释.
Export-PSSession
默认使用 BOM 创建 UTF-8 文件.
New-Item -Type File -Value
当前创建 BOM-less(!) UTF-8.
Send-MailMessage
帮助主题还声称 ASCII 编码是默认值 - 我没有亲自验证该声明.
Start-Transcript
总是创建带有 BOM 的 UTF-8 文件,但请参阅 -Append
的注释下面.
重新追加到现有文件的命令:
>>
/Out-File -Append
使 no 尝试匹配文件的现有内容的编码.也就是说,他们盲目地应用他们的默认编码,除非用 -Encoding
另有指示,这不是 >>>
的选项(除了在 PSv5.1+ 中间接使用,通过 $PSDefaultParameterValues
,如上所示).简而言之:您必须知道现有文件内容的编码并使用相同的编码进行追加.
Add-Content
是一个值得称道的例外:在没有明确的 -Encoding
参数的情况下,它会检测现有编码并自动将其应用于新内容.谢谢,js2010.请注意,在 Windows PowerShell 中,这意味着如果现有内容没有 BOM,则应用的是 ANSI 编码,而在 PowerShell Core 中则是 UTF-8.
Out-File -Append
/>>
和 Add-Content
之间的这种不一致,也会影响到 PowerShell Core,在这个 GitHub 问题中进行了讨论.>
Export-Csv -Append
部分匹配现有编码:如果现有文件的编码是任何 ASCII,它会盲目地附加 UTF-8/UTF-8/ANSI,但正确匹配 UTF-16LE 和 UTF-16BE.
换句话说:在没有 BOM 的情况下,Export-Csv -Append
假定 UTF-8 是,而 Add-Content
假定 ANSI.
Start-Transcript -Append
部分匹配现有编码:它正确匹配编码和 BOM,但默认为潜在的有损 ASCII 编码一个的缺席.
读取的Cmdlet(即在缺少BOM时使用的编码):
Get-Content
和 Import-PowerShellDataFile
默认为ANSI(Default
),与Set-Content
一致代码>.
ANSI 也是 PowerShell 引擎本身在从文件中读取源代码时的默认设置.
相比之下,Import-Csv
、Import-CliXml
和 Select-String
在没有 BOM 的情况下假定为 UTF-8.
By default, when you redirect the output of a command to a file or pipe it into something else in PowerShell, the encoding is UTF-16, which isn't useful. I'm looking to change it to UTF-8.
It can be done on a case-by-case basis by replacing the >foo.txt
syntax with | out-file foo.txt -encoding utf8
but this is awkward to have to repeat every time.
The persistent way to set things in PowerShell is to put them in UsersmeDocumentsWindowsPowerShellprofile.ps1
; I've verified that this file is indeed executed on startup.
It has been said that the output encoding can be set with $PSDefaultParameterValues = @{'Out-File:Encoding' = 'utf8'}
but I've tried this and it had no effect.
https://blogs.msdn.microsoft.com/powershell/2006/12/11/outputencoding-to-the-rescue/ which talks about $OutputEncoding
looks at first glance as though it should be relevant, but then it talks about output being encoded in ASCII, which is not what's actually happening.
How do you set PowerShell to use UTF-8?
Note:
The next section applies primarily to Windows PowerShell.
- See the section after it for the cross-platform PowerShell Core (v6+) edition.
In both cases, the information applies to making PowerShell use UTF-8 for reading and writing files.
- By contrast, for information on how to send and receive UTF-8-encoded strings to and from external programs, see this answer.
In PSv5.1 or higher, where
>
and>>
are effectively aliases ofOut-File
, you can set the default encoding for>
/>>
/Out-File
via the$PSDefaultParameterValues
preference variable:$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
- Note:
In Windows PowerShell (the legacy edition whose latest and final version is v5.1), this invariably creates UTF-8 file with a (pseudo) BOM.
- Many Unix-based utilities do not recognize this BOM (see bottom); see this post for workarounds that create BOM-less UTF-8 files.
In PowerShell (Core) v6+, BOM-less UTF-8 is the default (see next section), but if you do want a BOM there, you can use
'utf8BOM'
In PSv5.0 or below, you cannot change the encoding for
>
/>>
, but, on PSv3 or higher, the above technique does work for explicit calls toOut-File
.
(The$PSDefaultParameterValues
preference variable was introduced in PSv3.0).In PSv3.0 or higher, if you want to set the default encoding for all cmdlets that support
an-Encoding
parameter (which in PSv5.1+ includes>
and>>
), use:$PSDefaultParameterValues['*:Encoding'] = 'utf8'
If you place this command in your $PROFILE
, cmdlets such as Out-File
and Set-Content
will use UTF-8 encoding by default, but note that this makes it a session-global setting that will affect all commands / scripts that do not explicitly specify an encoding via their -Encoding
parameter.
Similarly, be sure to include such commands in your scripts or modules that you want to behave the same way, so that they indeed behave the same even when run by another user or a different machine; however, to avoid a session-global change, use the following form to create a local copy of $PSDefaultParameterValues
:
$PSDefaultParameterValues = @{ '*:Encoding' = 'utf8' }
For a summary of the wildly inconsistent default character encoding behavior across many of the Windows PowerShell standard cmdlets, see the bottom section.
The automatic $OutputEncoding
variable is unrelated, and only applies to how PowerShell communicates with external programs (what encoding PowerShell uses when sending strings to them) - it has nothing to do with the encoding that the output redirection operators and PowerShell cmdlets use to save to files.
Optional reading: The cross-platform perspective: PowerShell Core:
PowerShell is now cross-platform, via its PowerShell Core edition, whose encoding - sensibly - defaults to BOM-less UTF-8, in line with Unix-like platforms.
This means that source-code files without a BOM are assumed to be UTF-8, and using
>
/Out-File
/Set-Content
defaults to BOM-less UTF-8; explicit use of theutf8
-Encoding
argument too creates BOM-less UTF-8, but you can opt to create files with the pseudo-BOM with theutf8bom
value.If you create PowerShell scripts with an editor on a Unix-like platform and nowadays even on Windows with cross-platform editors such as Visual Studio Code and Sublime Text, the resulting
*.ps1
file will typically not have a UTF-8 pseudo-BOM:- This works fine on PowerShell Core.
- It may break on Windows PowerShell, if the file contains non-ASCII characters; if you do need to use non-ASCII characters in your scripts, save them as UTF-8 with BOM.
Without the BOM, Windows PowerShell (mis)interprets your script as being encoded in the legacy "ANSI" codepage (determined by the system locale for pre-Unicode applications; e.g., Windows-1252 on US-English systems).
Conversely, files that do have the UTF-8 pseudo-BOM can be problematic on Unix-like platforms, as they cause Unix utilities such as
cat
,sed
, andawk
- and even some editors such asgedit
- to pass the pseudo-BOM through, i.e., to treat it as data.- This may not always be a problem, but definitely can be, such as when you try to read a file into a string in
bash
with, say,text=$(cat file)
ortext=$(<file)
- the resulting variable will contain the pseudo-BOM as the first 3 bytes.
- This may not always be a problem, but definitely can be, such as when you try to read a file into a string in
Inconsistent default encoding behavior in Windows PowerShell:
Regrettably, the default character encoding used in Windows PowerShell is wildly inconsistent; the cross-platform PowerShell Core edition, as discussed in the previous section, has commendably put and end to this.
Note:
The following doesn't aspire to cover all standard cmdlets.
Googling cmdlet names to find their help topics now shows you the PowerShell Core version of the topics by default; use the version drop-down list above the list of topics on the left to switch to a Windows PowerShell version.
As of this writing, the documentation frequently incorrectly claims that ASCII is the default encoding in Windows PowerShell - see this GitHub docs issue.
Cmdlets that write:
Out-File
and >
/ >>
create "Unicode" - UTF-16LE - files by default - in which every ASCII-range character (too) is represented by 2 bytes - which notably differs from Set-Content
/ Add-Content
(see next point); New-ModuleManifest
and Export-CliXml
also create UTF-16LE files.
Set-Content
(and Add-Content
if the file doesn't yet exist / is empty) uses ANSI encoding (the encoding specified by the active system locale's ANSI legacy code page, which PowerShell calls Default
).
Export-Csv
indeed creates ASCII files, as documented, but see the notes re -Append
below.
Export-PSSession
creates UTF-8 files with BOM by default.
New-Item -Type File -Value
currently creates BOM-less(!) UTF-8.
The Send-MailMessage
help topic also claims that ASCII encoding is the default - I have not personally verified that claim.
Start-Transcript
invariably creates UTF-8 files with BOM, but see the notes re -Append
below.
Re commands that append to an existing file:
>>
/ Out-File -Append
make no attempt to match the encoding of a file's existing content.
That is, they blindly apply their default encoding, unless instructed otherwise with -Encoding
, which is not an option with >>
(except indirectly in PSv5.1+, via $PSDefaultParameterValues
, as shown above).
In short: you must know the encoding of an existing file's content and append using that same encoding.
Add-Content
is the laudable exception: in the absence of an explicit -Encoding
argument, it detects the existing encoding and automatically applies it to the new content.Thanks, js2010. Note that in Windows PowerShell this means that it is ANSI encoding that is applied if the existing content has no BOM, whereas it is UTF-8 in PowerShell Core.
This inconsistency between Out-File -Append
/ >>
and Add-Content
, which also affects PowerShell Core, is discussed in this GitHub issue.
Export-Csv -Append
partially matches the existing encoding: it blindly appends UTF-8 if the existing file's encoding is any of ASCII/UTF-8/ANSI, but correctly matches UTF-16LE and UTF-16BE.
To put it differently: in the absence of a BOM, Export-Csv -Append
assumes UTF-8 is, whereas Add-Content
assumes ANSI.
Start-Transcript -Append
partially matches the existing encoding: It correctly matches encodings with BOM, but defaults to potentially lossy ASCII encoding in the absence of one.
Cmdlets that read (that is, the encoding used in the absence of a BOM):
Get-Content
and Import-PowerShellDataFile
default to ANSI (Default
), which is consistent with Set-Content
.
ANSI is also what the PowerShell engine itself defaults to when it reads source code from files.
By contrast, Import-Csv
, Import-CliXml
and Select-String
assume UTF-8 in the absence of a BOM.
这篇关于将 PowerShell 的默认输出编码更改为 UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!