如何在PowerShell中正确/全局设置UTF-8(无BOM)文件? [英] How to cat a UTF-8 (no BOM) file properly/globally in PowerShell?
问题描述
创建文件utf8.txt
.确保编码为UTF-8(无BOM).将其内容设置为€
在cmd.exe
中:
type utf8.txt > out.txt
out.txt
的内容为€
在PowerShell(v4)中:
cat .\utf8.txt > out.txt
或
type .\utf8.txt > out.txt
Out.txt的内容是€
如何全局使PowerShell正常工作?
注意:此答案是关于 Windows PowerShell (最高v5.1)的; PowerShell [em,v6 +] ,PowerShell的跨平台版本,现在幸运地默认为无BOM UTF-8 输入和输出.
Windows PowerShell,与基础.NET框架不同 [1] ,则使用以下默认值:
-
输入 上的
-
:不带BOM(字节顺序标记)的文件假定位于系统的 默认编码,即 旧版 单字节编码,通过控制面板配置).>
-
在 output 上:默认情况下,
>
和>>
重定向操作符会生成 UTF-16 LE 文件(其中确实有-并且需要-BOM).
消耗文件并生成的cmdlet通常支持-Encoding
参数,该参数使您可以显式指定编码.
在PowerShell v5.1之前,显式使用基础Out-File
cmdlet是更改编码的唯一方法.
在 PowerShell v5.1 + 中,>
和>>
成为Out-File
的有效别名,允许您通过$PSDefaultParameterValues
首选项更改>
和>>
的编码行为.多变的;例如:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
.
为使PowerShell能够正确处理UTF-8,必须将其同时指定为输入和输出编码 [2] ,但请注意,在输出上,PowerShell 始终会向UTF-8文件添加BOM.
应用于您的示例:
Get-Content -Encoding utf8 .\utf8.txt | Out-File -Encoding utf8 out.txt
要在PowerShell中创建没有 BOM 的UTF-8文件,请参见此答案我的.
[1]默认情况下,.NET框架使用(无BOM)UTF-8进行输入和输出.
PowerShell和建立在其上的.NET框架之间的这种有意的行为差异是不寻常.
[2] Get-Content
确实会自动识别带有BOM的 的UTF-8文件.
Create a file utf8.txt
. Ensure the encoding is UTF-8 (no BOM). Set its content to €
In cmd.exe
:
type utf8.txt > out.txt
Content of out.txt
is €
In PowerShell (v4):
cat .\utf8.txt > out.txt
or
type .\utf8.txt > out.txt
Out.txt content is €
How do I globally make PowerShell work correctly?
Note: This answer is about Windows PowerShell (up to v5.1); PowerShell [Core, v6+], the cross-platform edition of PowerShell, now fortunately defaults to BOM-less UTF-8 on both in- and output.
Windows PowerShell, unlike the underlying .NET framework[1] , uses the following defaults:
on input: files without a BOM (byte-order mark) are assumed to be in the system's default encoding, which is the legacy Windows code page ("ANSI" code page: the active, culture-specific single-byte encoding, as configured via Control Panel).
on output: the
>
and>>
redirection operators produce UTF-16 LE files by default (which do have - and need - a BOM).
File-consuming and -producing cmdlets do usually support an -Encoding
parameter that lets you specify the encoding explicitly.
Prior to PowerShell v5.1, using the underlying Out-File
cmdlet explicitly was the only way to change the encoding.
In PowerShell v5.1+, >
and >>
became effective aliases of Out-File
, allowing you to change the encoding behavior of >
and >>
via the $PSDefaultParameterValues
preference variable; e.g.:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'
.
For PowerShell to handle UTF-8 properly, you must specify it as both the input and output encoding[2] , but note that on output, PowerShell invariably adds a BOM to UTF-8 files.
Applied to your example:
Get-Content -Encoding utf8 .\utf8.txt | Out-File -Encoding utf8 out.txt
To create a UTF-8 file without a BOM in PowerShell, see this answer of mine.
[1] The .NET framework uses (BOM-less) UTF-8 by default, both for in- and output.
This - intentional - difference in behavior between PowerShell and the .NET framework it is built on is unusual.
[2] Get-Content
does, however, automatically recognize UTF-8 files with a BOM.
这篇关于如何在PowerShell中正确/全局设置UTF-8(无BOM)文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!