具有UTF-8编码的Powershell字符串变量 [英] Powershell string variable with UTF-8 encoding

查看:86
本文介绍了具有UTF-8编码的Powershell字符串变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我检查了许多与此相关的问题,但是找不到解决我问题的方法.基本上,我想将UTF-8编码的字符串存储在变量中,然后将该字符串用作文件名.

I checked many related questions about this, but I couldn't find something that solves my problem. Basically, I want to store a UTF-8 encoded string in a variable and then use that string as a file name.

例如,我正在尝试下载YouTube视频.如果我们打印视频标题,则会显示非英文字符( ytd 此处为

For example, I'm trying to download a YouTube video. If we print the video title, the non-English characters show up (ytd here is youtube-dl):

./ytd https://www.youtube.com/watch?v=GWYndKw_zbw -e

输出: [LEEPLAY]입문입문City Pop MIX(播放列表)

但是,如果我将其存储在变量中并打印出来,则会忽略朝鲜语字符:

But if I store this in a variable and print it, the Korean characters are ignored:

$ vtitle = ./ytd https://www.youtube.com/watch?v=GWYndKw_zbw -e

$ vtitle

输出: [LEEPLAY] City Pop MIX(播放列表)

推荐答案

PowerShell解释外部程序的输出(在您的情况下,例如 ytd ),假定输出使用 [Console] :: OutputEncoding 中反映的字符编码.

When PowerShell interprets output from external programs (such as ytd in your case), it assumes that the output uses the character encoding reflected in [Console]::OutputEncoding.

注意:

  • 解释是指PowerShell 捕获(例如, $ output = ... ),继电器(例如 ... | Select-String ... )或重定向(例如 ...> output.txt )外部程序的输出.

  • Interpreting refers to cases where PowerShell captures (e.g., $output = ...), relays (e.g., ... | Select-String ...), or redirects (e.g., ... > output.txt) the external program's output.

相比之下,由于不涉及PowerShell,因此直接打印到显示器不会受到影响(这解释了为什么 ytd时,字符在控制台中的显示与预期的一样的输出直接打印到它.)

By contrast, printing directly to the display is not affected, because PowerShell isn't involved (which explains why the characters looked as expected in your console when ytd's output printed directly to it).

如果 [Console] :: OutputEncoding 报告的编码与当前外部程序使用的编码不同,则PowerShell 会误解输出.

If the encoding reported by [Console]::OutputEncoding is not the same encoding used by the external program at hand, PowerShell misinterprets the output.

要解决此问题,必须(临时)设置 [Console] :: OutputEncoding] 以匹配外部程序使用的编码.

To fix that, you must (temporarily) set [Console]::OutputEncoding] to match the encoding used by the external program.

例如,让我们假设一个可执行文件 foo.exe 输出UTF-8编码的文本:

For instance, let's assume an executable foo.exe that outputs UTF-8-encoded text:

# Save the current encoding and switch to UTF-8.
$prev = [Console]::OutputEncoding
[Console]::OutputEncoding = [Text.Encoding]::Utf8

# PowerShell now interprets foo's output correctly as UTF-8-encoded.
# and $output will correctly contain CJK characters.
$output = foo https://example.org -e

# Restore the previous encoding.
[Console]::OutputEncoding = $prev

重要:

  • Modifying [Console]::OutputEncoding to match an external program's character encoding is broken in PowerShell Core v6.x , but fortunately has been fixed in v7.0+ - see this GitHub bug report; Windows PowerShell is unaffected.

[Console] :: OutputEncoding 默认情况下反映了与旧系统语言环境的 OEM代码页相关联的编码通过 chcp (例如,在美式英语系统中为 437 ).

[Console]::OutputEncoding by default reflects the encoding associated with the legacy system locale's OEM code page, as reported by chcp (e.g. 437 on US-English systems).

  • Recent versions of Windows 10 now allow setting the system locale to code page 65001 (UTF-8) (the feature is still in beta as of Window 10 version 1909), which is great, considering that most modern command-line utilities "speak" UTF-8.

有了特定的程序, youtube-dl js2010已发现,如果您通过-encode utf进行捕获,则无需费力即可捕获变量-16 .

With the specific program at hand, youtube-dl, js2010 has discovered that capturing in a variable works without extra effort if you pass --encoding utf-16.

之所以可行,是因为最终的UTF16-LE编码输出以 BOM (字节顺序标记)开头.

The reason this works is that the resulting UTF16-LE-encoded output is preceded by a BOM (Byte-Order Mark).

(请注意,-编码utf-8 不能正常工作,因为 youtube-dl 不能 发出BOM.)

(Note that --encoding utf-8 does not work, because youtube-dl then does not emit a BOM.)

Windows PowerShell 能够检测并正确解码UTF-16LE编码和UTF-8编码的文本,而不考虑有效的 [Console] :: OutputEncoding] 如果且仅在输出之前带有BOM表,则为.

Windows PowerShell is capable of detecting and properly decoding UTF-16LE-encoded and UTF-8-encoded text irrespective of the effective [Console]::OutputEncoding] IF AND ONLY IF the output is preceded by a BOM.

注意事项:

  • 这在PowerShell Core (在任何受支持的平台上)上都不起作用.

  • This does not work in PowerShell Core (on any of the supported platforms).

即使在Windows PowerShell中,您也很少能够利用这种晦涩的行为,因为在 stdout输出中使用BOM是非典型(它是通常仅用于文件).

Even in Windows PowerShell you'll rarely be able to take advantage of this obscure behavior, because using a BOM in stdout output is atypical (it is typically only used in files).

这篇关于具有UTF-8编码的Powershell字符串变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆