如何在 PowerShell 中正确/全局地转换 UTF-8(无 BOM)文件?(到另一个文件) [英] How to cat a UTF-8 (no BOM) file properly/globally in PowerShell? (to another file)

查看:110
本文介绍了如何在 PowerShell 中正确/全局地转换 UTF-8(无 BOM)文件?(到另一个文件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

创建一个文件utf8.txt.确保编码为 UTF-8(无 BOM).将其内容设置为

cmd.exe中:

输入utf8.txt >out.txt

out.txt 的内容是

在 PowerShell (v4) 中:

cat .\utf8.txt >out.txt

输入 .\utf8.txt >out.txt

Out.txt 的内容是

如何在全局范围内使 PowerShell 正常工作?

解决方案

注意:此答案是关于 Windows PowerShell(最高 v5.1);PowerShell [Core, v6+],PowerShell 的跨平台版本,幸运的是,现在默认为BOM-less UTF-8 输入和输出.


Windows PowerShell,不同于底层的 .NET Framework[1], 使用以下默认值:

  • 输入:假定没有 BOM(字节顺序标记)的文件位于系统的默认编码,即传统 Windows 代码页(ANSI"代码页:活动的、特定于文化的单字节编码,通过控制面板配置).

  • 输出:>>> 重定向操作符产生 UTF-16 LE 文件默认情况下(确实有 - 并且需要 - BOM).

使用文件和生成文件的 cmdlet通常支持 -Encoding 参数,让您明确指定编码.
在 Windows PowerShell v5.1 之前,显式使用基础 Out-File cmdlet 是更改编码的唯一方法.
Windows PowerShell v5.1+中,>>>成为Out-File的有效别名,允许您通过 $PSDefaultParameterValues 首选项变量更改 >>> 的编码行为;例如:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'.

要让 Windows PowerShell 正确处理 UTF-8,您必须将其指定为输入和输出编码[2],但请注意,在输出中,PowerShell 总是向 UTF-8 文件添加 BOM.

应用于您的示例:

Get-Content -Encoding utf8 .\utf8.txt |Out-File -Encoding utf8 out.txt

要在 PowerShell 中创建 没有 BOM 的 UTF-8 文件,请参阅此答案我的.


[1] .NET Framework 默认使用(无 BOM)UTF-8,用于输入和输出.
Windows PowerShell 与其构建的框架之间的这种有意的行为差异不寻常.PowerShell [Core] v6+ 中的差异消失了:.NET [Core] 和 PowerShell [Core] 默认为无 BOM 的 UTF-8.

[2] Get-Content 确实会自动识别带有 BOM 的 UTF-8 文件.

Create a file utf8.txt. Ensure the encoding is UTF-8 (no BOM). Set its content to

In cmd.exe:

type utf8.txt > out.txt

Content of out.txt is

In PowerShell (v4):

cat .\utf8.txt > out.txt

or

type .\utf8.txt > out.txt

Out.txt content is €

How do I globally make PowerShell work correctly?

解决方案

Note: This answer is about Windows PowerShell (up to v5.1); PowerShell [Core, v6+], the cross-platform edition of PowerShell, now fortunately defaults to BOM-less UTF-8 on both in- and output.


Windows PowerShell, unlike the underlying .NET Framework[1] , uses the following defaults:

  • on input: files without a BOM (byte-order mark) are assumed to be in the system's default encoding, which is the legacy Windows code page ("ANSI" code page: the active, culture-specific single-byte encoding, as configured via Control Panel).

  • on output: the > and >> redirection operators produce UTF-16 LE files by default (which do have - and need - a BOM).

File-consuming and -producing cmdlets do usually support an -Encoding parameter that lets you specify the encoding explicitly.
Prior to Windows PowerShell v5.1, using the underlying Out-File cmdlet explicitly was the only way to change the encoding.
In Windows PowerShell v5.1+, > and >> became effective aliases of Out-File, allowing you to change the encoding behavior of > and >> via the $PSDefaultParameterValues preference variable; e.g.:
$PSDefaultParameterValues['Out-File:Encoding'] = 'utf8'.

For Windows PowerShell to handle UTF-8 properly, you must specify it as both the input and output encoding[2] , but note that on output, PowerShell invariably adds a BOM to UTF-8 files.

Applied to your example:

Get-Content -Encoding utf8 .\utf8.txt | Out-File -Encoding utf8 out.txt

To create a UTF-8 file without a BOM in PowerShell, see this answer of mine.


[1] .NET Framework uses (BOM-less) UTF-8 by default, both for in- and output.
This - intentional - difference in behavior between Windows PowerShell and the framework it is built on is unusual. The difference went away in PowerShell [Core] v6+: both .NET [Core] and PowerShell [Core] default to BOM-less UTF-8.

[2] Get-Content does, however, automatically recognize UTF-8 files with a BOM.

这篇关于如何在 PowerShell 中正确/全局地转换 UTF-8(无 BOM)文件?(到另一个文件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆