为什么PowerShell重定向>>更改文本内容的格式? [英] Why does PowerShell redirection >> change the formatting of the text content?

查看:97
本文介绍了为什么PowerShell重定向>>更改文本内容的格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用重定向附加>>或写>来写入txt文件,但是当我这样做时,我会收到奇怪的格式"\x00a\x00p...".

I want to use the redirect append >> or write > to write to a txt file, but when I do, I receive a weird format "\x00a\x00p...".

我成功使用了Set-ContentAdd-Content,为什么它们能按预期运行,但>>>重定向运算符却没有运行?

I successfully use Set-Content and Add-Content, why do they function as expected, but not the >> and > redirect operators?

使用PowerShell cat和简单的Python打印显示输出.

Showing the output using PowerShell cat as well as simple Python print.

rocket_brain> new-item test.txt
rocket_brain> "appended using add-content" | add-content test.txt
rocket_brain> cat test.txt

 appended using add-content

但是如果我使用重定向附加>>

but then if I use redirect append >>

rocket_brain> "appended using redirect" >> test.txt
rocket_brain> cat test.txt

 appended using add-content
 a p p e n d e d   u s i n g   r e d i r e c t

简单的Python脚本:read_test.py

Simple Python script: read_test.py

with open("test.txt", "r") as file:   # open test.txt in readmode
    data = file.readlines()           # append each line to the list data
    print(data)                       # output list with each input line as an item

使用read_test.py,我发现格式有所不同

Using read_test.py I see a difference in formatting

rocket_brain> python read_test.txt
 ['appended using add-content\n', 'a\x00p\x00p\x00e\x00n\x00d\x00e\x00d\x00 \x00u\x00s\x00i\x00n\x00g\x00 \x00r\x00e\x00d\x00i\x00r\x00e\x00c\x00t\x00\r\x00\n', '\x00']

注意:如果我仅使用重定向附加>>(或写>)而不先使用Add-Content,则cat输出看起来很正常(而不是间隔开),但是我会得到<使用Python脚本时,每一行的c17>格式(包括从>运算符开始的任何Add-Content命令).在记事本(或VS等)中打开文件,文本始终看起来像预期的那样.在cmd(而不是PS)中使用>>>也会以预期的ascii格式存储文本.

NOTE: If I use only the redirect append >> (or write >) without first using Add-Content, the cat output looks normal (instead of spaced out), but I will then get the /x00p format for every line when using the Python script (including any Add-Content command after starting with > operators). Opening the file in Notepad (or VS etc), the text always looks as expected. Using >> or > in cmd (instead of PS) also stores text in expected ascii format.

相关链接: cmd重定向运算符 PS重定向操作符

推荐答案

注意:问题最终在于,在 Windows PowerShell 中,不同的cmdlet/运算符使用不同的默认编码.此问题已在PowerShell Core (v6 +)中得到了解决,在该问题中,始终使用无BOM的UTF-8.

Note: The problem is ultimately that in Windows PowerShell different cmdlets / operators use different default encodings. This problem has been resolved in PowerShell Core(v6+), where BOM-less UTF-8 is consistently used.

    在附加到现有文件时,
  • >>盲目地应用Out-File的默认编码(实际上,>的行为类似于Out-File,而>>的行为类似于),在 Windows PowerShell 中是名为 Unicode的编码,即UTF-16LE ,其中大多数字符都编码为2字节序列,即使是ASCII范围;后者的高字节为0x0(NUL).

  • >> blindly applies Out-File's default encoding when appending to an existing file (in effect, > behaves like Out-File and >> like Out-File -Append), which in Windows PowerShell is the encoding named Unicode, i.e., UTF-16LE, where most characters are encoded as 2-byte sequences, even those in the ASCII range; the latter have a 0x0 (NUL) as the high byte.

  • 因此,除非目标文件的现有内容使用相同的编码,否则您最终将得到不同编码的 mix ,这就是您所遇到的情况. [1]

Add-Content确实尝试检测文件的现有编码再次感谢, js2010 .,则在文件上使用了该文件,在这种情况下,将应用 Set-Content的默认编码,在 Windows中PowerShell 是名为 Default 的编码,指的是系统的活动ANSI代码页.

While Add-Content, by contrast, does try to detect a file's existing encodingThanks again, js2010., you used it on an empty file, in which case Set-Content's default encoding is applied, which in Windows PowerShell is the encoding named Default, which refers to your system's active ANSI code page.

因此,为了在添加更多内容时匹配Add-Content调用最初创建的单字节ANSI编码,使用Out-File -Append -Encoding Default代替>>,或者直接使用Add-Content .

Therefore, to match the single-byte ANSI encoding initially created by your Add-Content call when appending further content, use Out-File -Append -Encoding Default instead of >>, or simply keep using Add-Content.

    或者,用Add-Content -Encoding ...选择一种不同的编码,然后在Out-File -Append调用中进行匹配;通常,UTF-8是最佳选择,但是请注意,当您在Windows PowerShell中创建UTF-8文件时,它将以BOM表(将文件标识为UTF-8的伪字节顺序标记)开头,类似于Unix平台通常不期望).
  • Alternatively, pick a different encoding with Add-Content -Encoding ... and match it in the Out-File -Append call; UTF-8 is generally the best choice, though note that when you create a UTF-8 file in Windows PowerShell, it will start with a BOM (a pseudo byte-order mark identifying the file as UTF-8, which Unix-like platforms typically do not expect).

在PowerShell v5.1 +中,您还可以全局更改默认编码,包括>>>的默认编码(在早期版本中是不可能的).例如,要更改为UTF-8,请使用:
$PSDefaultParameterValues['*:Encoding']='UTF8'

In PowerShell v5.1+ you may also change the default encoding globally, including for > and >> (which isn't possible in earlier versions). To change to UTF-8, for instance, use:
$PSDefaultParameterValues['*:Encoding']='UTF8'

除了使用不同的默认编码(在Windows PowerShell中)外,重要的是要注意一方面 Set-Content/Add-Content以及另一方面>/>>/Out-File [-Append]非字符串输入完全不同:

Aside from different default encodings (in Windows PowerShell), it is important to note that Set-Content / Add-Content on the one hand and > / >> / Out-File [-Append] on the other behave fundamentally differently with non-string input:

简而言之:前者对输入对象应用简单的.ToString()格式,而后者执行与控制台相同的输出格式-请参见

In short: the former apply simple .ToString()-formatting to the input objects, whereas the latter perform the same output formatting you would see in the console - see this answer for details.

[1]由于Add-Content设置的初始内容,Windows PowerShell将文件解释为ANSI编码(没有BOM时的默认值),其中每个字节都是其自己的字符.因此,后面附加的UTF-16内容也将被解释为ANSI,因此0x0字节本身就被当作字符对待,并像空格一样打印到控制台.

[1] Due to the initial content set by Add-Content, Windows PowerShell interprets the file as ANSI-encoded (the default in the absence of a BOM), where each byte is its own character. The UTF-16 content appended later is therefore also interpreted as if it were ANSI, so the 0x0 bytes are treated like characters in their own right, which print to the console like spaces.

这篇关于为什么PowerShell重定向&gt;&gt;更改文本内容的格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆