在命令提示符/Windows Powershell(Windows 10)中使用UTF-8编码(CHCP 65001) [英] Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)

查看:1069
本文介绍了在命令提示符/Windows Powershell(Windows 10)中使用UTF-8编码(CHCP 65001)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一段时间以来,我一直在命令提示符和Windows Powershell中强制使用chcp 65001,但是根据Q& A在SO和其他社区中的帖子判断,它是

I've been forcing the usage of chcp 65001 in Command Prompt and Windows Powershell for some time now, but judging by Q&A posts on SO and several other communities it seems like a dangerous and inefficient solution. Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry? And if there isn't, is there a publicly announced timeline or agenda to support UTF-8 in the Windows CLI in the future?

我个人一直使用chcp 949来支持韩文字符,但是反斜杠 \ 的奇怪显示以及在某些应用程序(例如Neovim)中的不正确/难以理解的显示,以及不是韩国人最近无法通过949获得支持.

Personally I've been using chcp 949 for Korean Character Support, but the weird display of the backslash \ and incorrect/incomprehensible displays in several applications (like Neovim), as well as characters that aren't Korean not being supported via 949 seems to become more of a problem lately.

推荐答案

注意:

  • 此答案显示了如何在Windows控制台中将字符编码 切换为 UTF-8 (代码页65001 ),以便在与外部(控制台)程序通信时, shell (例如cmd.exe和PowerShell)正确地编码和解码字符(文本) PowerShell,在cmd.exe中也用于文件I/O. 1

  • This answer shows how to switch the character encoding in the Windows console to UTF-8 (code page 65001), so that shells such as cmd.exe and PowerShell properly encode and decode characters (text) when communicating with external (console) programs in PowerShell, and in cmd.exe also for file I/O.1

相比之下,如果您关注的是控制台窗口中 Unicode字符 rendering 的局限性的单独方面,请参见此答案,其中也讨论了替代控制台(终端)应用程序.

If, by contrast, your concern is about the separate aspect of the limitations of Unicode character rendering in console windows, see the middle and bottom sections of this answer, where alternative console (terminal) applications are discussed too.

Microsoft是否提供了chcp 65001的改进/完整替代方案,可以在不手动更改注册表的情况下将其永久保存?

Does Microsoft provide an improved / complete alternative to chcp 65001 that can be saved permanently without manual alteration of the Registry?

从(至少) Windows 10 (版本1903)开始,您可以选择将系统区域设置(非Unicode程序的语言)设置为UTF-8 ,但在撰写本文时,功能仍在 beta .

As of (at least) Windows 10, version 1903, you have the option to set the system locale (language for non-Unicode programs) to UTF-8, but the feature is in beta as of this writing.

要激活它:

  • 运行intl.cpl(这将在控制面板"中打开区域设置)
  • 按照下面的屏幕截图中的说明进行操作.
  • Run intl.cpl (which opens the regional settings in Control Panel)
  • Follow the instructions in the screen shot below.

  • 这将使所有将来的控制台窗口默认为UTF-8(chcp 65001).

  • This will make all future console windows default to UTF-8 (chcp 65001).

  • 注意事项:

  • 如果您使用的是 Windows PowerShell ,这也会使Get-ContentSet-Content (以及Windows PowerShell默认的其他上下文,因此系统的活动ANSI代码页)默认为UTF-8 (PowerShell Core (v6 +)始终这样做).这意味着,在没有-Encoding参数的情况下,然后会误读经过ANSI编码的无BOM文件(这在历史上很常见),并且使用Set-Content创建的文件将为UTF-8,而不是ANSI-编码.

  • If you're using Windows PowerShell, this will also make Get-Content and Set-Content (and possibly other contexts where Windows PowerShell default so the system's active ANSI code page) default to UTF-8 (which PowerShell Core (v6+) always does). This means that, in the absence of an -Encoding argument, BOM-less files that are ANSI-encoded (which is historically common) will then be misread, and files created with Set-Content will be UTF-8 rather than ANSI-encoded.

至少在PowerShell 7.0之前,基础.NET版本(.NET Core 3.1)中的错误会在PowerShell中导致后续错误: UTF-8 BOM 意外地放在了通过stdin发送到外部进程的数据之前(与您将$OutputEncoding设置为什么无关),特别是破坏了Start-Job -请参阅此GitHub问题.

Up to at least PowerShell 7.0, a bug in the underlying .NET version (.NET Core 3.1) causes follow-on bugs in PowerShell: a UTF-8 BOM is unexpectedly prepended to data sent to external processes via stdin (irrespective of what you set $OutputEncoding to), which notably breaks Start-Job - see this GitHub issue.

并非所有字体都使用Unicode,因此请选择TT(TrueType)字体,但即使它们通常仅支持所有字符的子集 ,因此您可以必须尝试使用​​特定的字体,以查看您关心的所有字符是否都已表示出来-有关详细信息,请参见此答案,其中也讨论了该问题具有更好的Unicode渲染支持的替代控制台(终端)应用程序.

Not all fonts speak Unicode, so pick a TT (TrueType) font, but even they usually support only a subset of all characters, so you may have to experiment with specific fonts to see if all characters you care about are represented - see this answer for details, which also discusses alternative console (terminal) applications that have better Unicode rendering support.

正如 eryksun 所指出的,传统控制台应用程序不会说" UTF-8将仅限于仅ASCII输入,并且在尝试输出(7位)ASCII范围之外的字符时会产生不正确的输出. (在过时的Windows 7及更低版本中,程序甚至可能崩溃).
如果运行旧版控制台应用程序对您来说很重要,请参阅注释中eryksun的建议.

As eryksun points out, legacy console applications that do not "speak" UTF-8 will be limited to ASCII-only input and will produce incorrect output when trying to output characters outside the (7-bit) ASCII range. (In the obsolescent Windows 7 and below, programs may even crash).
If running legacy console applications is important to you, see eryksun's recommendations in the comments.

但是,对于 Windows PowerShell 不够不够:

However, for Windows PowerShell, that is not enough:

  • 您还必须另外$OutputEncoding首选项变量设置为UTF-8 :$OutputEncoding = System.Text.UTF8Encoding;将该命令添加到您的$PROFILE(仅当前用户)或$PROFILE.AllUsersCurrentHost(所有用户)文件中是最简单的.
  • 幸运的是,在PowerShell Core 中不再需要此功能,该功能在内部始终默认为无BOM的UTF-8.
  • You must additionally set the $OutputEncoding preference variable to UTF-8 as well: $OutputEncoding = System.Text.UTF8Encoding; it's simplest to add that command to your $PROFILE (current user only) or $PROFILE.AllUsersCurrentHost (all users) file.
  • Fortunately, this is no longer necessary in PowerShell Core, which internally consistently defaults to BOM-less UTF-8.

如果您的环境中不是选项不是将系统区域设置设置为UTF-8,请使用启动命令:

If setting the system locale to UTF-8 is not an option in your environment, use startup commands instead:

注意:上面提到的旧版控制台应用程序同样适用于此.如果运行旧版控制台应用程序对您很重要,请在注释中查看eryksun的建议.

Note: The caveat re legacy console applications mentioned above equally applies here. If running legacy console applications is important to you, see eryksun's recommendations in the comments.

  • 对于PowerShell (两个版本),将以下行添加到$PROFILE(仅当前用户)或$PROFILE.AllUsersCurrentHost(所有用户)文件中,等效于chcp 65001,并补充了设置首选项变量$OutputEncoding,以指示PowerShell通过UTF-8中的管道将数据发送到外部程序:

  • For PowerShell (both editions), add the following line to your $PROFILE (current user only) or $PROFILE.AllUsersCurrentHost (all users) file, which is the equivalent of chcp 65001, supplemented with setting preference variable $OutputEncoding to instruct PowerShell to send data to external programs via the pipeline in UTF-8:

  • 请注意,从内部在PowerShell会话中运行chcp 65001 无效的,因为.NET在启动时会缓存控制台的输出编码,并且不知道以后使用进行的更改chcp;此外,如前所述, Windows PowerShell 需要设置$OutputEncoding-有关此问题的答案,请参见此答案详细信息.
  • Note that running chcp 65001 from inside a PowerShell session is not effective, because .NET caches the console's output encoding on startup and is unaware of later changes made with chcp; additionally, as stated, Windows PowerShell requires $OutputEncoding to be set - see this answer for details.
$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding

  • 例如,下面是一种快速方法,以编程方式将此行添加到$PROFILE:
    • For example, here's a quick-and-dirty approach to add this line to $PROFILE programmatically:
    • '$OutputEncoding = [console]::InputEncoding = [console]::OutputEncoding = New-Object System.Text.UTF8Encoding' + [Environment]::Newline + (Get-Content -Raw $PROFILE) | Set-Content -Encoding utf8 $PROFILE
      

      • 对于cmd.exe ,通过注册表在键HKEY_CURRENT_USER\Software\Microsoft\Command Processor(仅当前用户)或HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor(所有用户)的值AutoRun中定义自动运行命令. ):

        • For cmd.exe, define an auto-run command via the registry, in value AutoRun of key HKEY_CURRENT_USER\Software\Microsoft\Command Processor (current user only) or HKEY_LOCAL_MACHINE\Software\Microsoft\Command Processor (all users):

          • 例如,您可以使用PowerShell为您创建此值:
          # Auto-execute `chcp 65001` whenever the current user opens a `cmd.exe` console
          # window (including when running a batch file):
          Set-ItemProperty 'HKCU:\Software\Microsoft\Command Processor' AutoRun 'chcp 65001 >NUL'
          


          可选阅读:为什么Windows PowerShell ISE 是一个糟糕的选择:

          尽管ISE的确比控制台具有更好的Unicode 呈现支持,但通常是一个糟糕的选择:


          Optional reading: Why the Windows PowerShell ISE is a poor choice:

          While the ISE does have better Unicode rendering support than the console, it is generally a poor choice:

          • 首先,ISE是过时的:它不支持PowerShell Core ,将来所有的开发都将在此进行,并且不会交叉-platform,不同于两个PowerShell版本都使用新的高级IDE, Visual Studio代码,该语言已经使用了UTF-8默认情况下,对于PowerShell Core ,并且可以将其配置为针对Windows PowerShell.

          • First and foremost, the ISE is obsolescent: it doesn't support PowerShell Core, where all future development will go, and it isn't cross-platform, unlike the new premier IDE for both PowerShell editions, Visual Studio Code, which already speaks UTF-8 by default for PowerShell Core and can be configured to do so for Windows PowerShell.

          ISE通常是用于开发脚本的环境,而不是用于在生产环境中运行的环境(如果您还为其他人编写脚本,应该假设它们将在 console 中运行);值得注意的是,在运行脚本方面,ISE的行为在所有方面都不尽相同.

          The ISE is generally an environment for developing scripts, not for running them in production (if you're writing scripts (also) for others, you should assume that they'll be run in the console); notably, the ISE's behavior is not the same in all aspects when it comes to running scripts.

          eryksun 指出,ISE不支持运行 interactive 外部控制台程序,即那些需要用户输入的程序:

          As eryksun points out, the ISE doesn't support running interactive external console programs, namely those that require user input:

          问题在于它隐藏了控制台,并将过程输出(但不是输入)重定向到管道.当文件是管道时,大多数控制台应用程序会切换到完全缓冲.另外,交互式应用程序需要从stdin读取,而这是无法从隐藏的控制台窗口读取的. (可以通过ShowWindow取消隐藏,但是单独的输入窗口很笨拙.)

          The problem is that it hides the console and redirects the process output (but not input) to a pipe. Most console applications switch to full buffering when a file is a pipe. Also, interactive applications require reading from stdin, which isn't possible from a hidden console window. (It can be unhidden via ShowWindow, but a separate window for input is clunky.)

          • 如果您愿意遵守此限制,那么将活动代码页切换到65001(UTF-8)以便与外部程序进行正确的通信需要一种尴尬的解决方法:

            • If you're willing to live with that limitation, switching the active code page to 65001 (UTF-8) for proper communication with external programs requires an awkward workaround:

              • 您必须首先通过从内置控制台运行任何外部程序来强制创建隐藏的控制台窗口,例如chcp-您将看到控制台窗口短暂闪烁

              • You must first force creation of the hidden console window by running any external program from the built-in console, e.g., chcp - you'll see a console window flash briefly.

              然后只有然后可以将[console]::OutputEncoding(和$OutputEncoding)设置为UTF-8,如上所示(如果尚未创建隐藏的控制台,您将得到handle is invalid error).

              Only then can you set [console]::OutputEncoding (and $OutputEncoding) to UTF-8, as shown above (if the hidden console hasn't been created yet, you'll get a handle is invalid error).

              1在PowerShell中,如果您从不调用外部程序,则无需担心系统区域设置(活动代码页):PowerShell本地命令和.NET调用始终通过UTF-16字符串(本机.NET字符串)和文件I/O将应用独立于系统区域设置的默认编码.同样,由于Windows API函数的 Unicode 版本用于向控制台打印和从控制台读取,因此非ASCII字符始终可以正确打印(在控制台的呈现限制内).
              相比之下,在cmd.exe中,系统区域设置对于文件I/O也很重要(特别是包括批处理文件源代码采用的编码),而不仅仅是与外部程序进行通信,例如在读取for /f循环.

              1 In PowerShell, if you never call external programs, you needn't worry about the system locale (active code pages): PowerShell-native commands and .NET calls always communicate via UTF-16 strings (native .NET strings) and on file I/O apply default encodings that are independent of the system locale. Similarly, because the Unicode versions of the Windows API functions are used to print to and read from the console, non-ASCII characters always print correctly (within the rendering limitations of the console).
              In cmd.exe, by contrast, the system locale matters for file I/O too (notably including what encoding to assume for batch-file source code), not just for communicating with external programs, such as when reading program output in a for /f loop.

              这篇关于在命令提示符/Windows Powershell(Windows 10)中使用UTF-8编码(CHCP 65001)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆