通过CMD和PowerShell进行管道传输时,行为和输出不同 [英] Different behaviour and output when piping through CMD and PowerShell

查看:115
本文介绍了通过CMD和PowerShell进行管道传输时,行为和输出不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将文件的内容传输到我制作的简单ASCII对称加密程序中.这是一个简单的程序,可从STDIN读取输入,并向输入的每个字节添加或减去某个值(224). 例如:如果第一个字节为4,而我们要加密,则它变为228.如果它超过255,则该程序仅执行一些模运算.

I am trying to pipe the content of a file to a simple ASCII symmetrical encryption program i made. It's a simple program that reads input from STDIN and adds or subtracts a certain value (224) to each byte of the input. For example: if the first byte is 4 and we want to encrypt, then it becomes 228. If it exceeds 255, the program just performs some modulo.

这是我通过cmd获得的输出(test.txt包含这是一个测试"):

This is the output I get with cmd (test.txt contains "this is a test"):

    type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
    this is a test

它也以其他方式起作用,因此它是一种对称加密算法

It also works the other way, thus it is a symmetrical encryption algorithm

    type .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt
    this is a test

但是,PowerShell上的行为不同.第一次加密时,我得到:

But, the behaviour on PowerShell is different. When encrypting first, I get:

    type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt
    this is a test_*

这就是我第一次解密时得到的:

And that is what I get when decrypting first:

也许是编码问题.预先感谢.

Maybe is an encoding problem. Thanks in advance.

推荐答案

tl; dr :

如果需要原始字节处理和/或需要防止PowerShell偶尔在文本数据中添加尾随换行符,请完全避免使用 PowerShell 管道.
而是用/c封装到cmd:

If you need raw byte handling and/or need to prevent PowerShell from situationally adding a trailing newline to your text data, avoid the PowerShell pipeline altogether.
Instead, shell out to cmd with /c:

cmd /c 'type .\test.txt | .\Crypt.exe --encrypt | .\Crypt.exe --decrypt'

请注意,如果要在PowerShell变量中捕获输出 ,则需要确保[Console]::OutputEncoding与您的.\Crypt.exe程序的(有效)输出匹配编码(有效的OEM代码页),在这种情况下,默认情况下应为true;有关详细信息,请参见下一部分.

Note that if you want to capture the output in a PowerShell variable, you need to make sure that [Console]::OutputEncoding matches your .\Crypt.exe program's (effective) output encoding (the active OEM code page), which should be true by default in this case; see the next section for details.

但是,通常最好避免对 text 数据进行 byte 操作.

Generally, however, byte manipulation of text data is best avoided.

两个单独的问题,其中只有一个是简单的解决方案:

There are two separate problems, only one of which as a simple solution:

问题1 :您怀疑确实存在字符编码问题:

Problem 1: There is indeed a character encoding problem, as you suspected:

PowerShell隐式地插入自身作为管道中的中介,即使向外部程序发送数据或从其接收数据时也是如此:它在.NET中进行数据转换.字符串(System.String),它们是UTF-16代码单元的序列.

PowerShell invisibly inserts itself as an intermediary in pipelines, even when sending data to and receiving data from external programs: It converts data from and to .NET strings (System.String), which are sequences of UTF-16 code units.

  • 顺便说一句:即使仅使用PowerShell本机命令,这也意味着从文件读取输入并再次保存它们可能会导致字符编码不同,因为一旦将(字符串)数据读入内存,就不会保留有关原始字符编码的信息,并且在保存时将使用cmdlet的 default 字符编码;虽然此默认编码在 PowerShell [Core] 6 + 中始终为无BOM的UTF-8,但在 Windows PowerShell 中,此cmdlet随cmdlet的不同而有所不同-请参见
  • As an aside: Even when using only PowerShell-native commands, this means that reading input from files and saving them again can result in a different character encoding, because the information about the original character encoding is not preserved once (string) data has been read into memory, and on saving it is the cmdlets' default character encoding that is used; while this default encoding is consistently BOM-less UTF-8 in PowerShell [Core] 6+, it varies by cmdlet in Windows PowerShell - see this answer.

为了收发外部程序(例如您的情况下的Crypt.exe)数据,您需要匹配字符编码;在您的情况下,对于使用原始 byte 处理的Windows控制台应用程序,隐式编码是系统的活动OEM代码页.

In order to send to and receive data from external programs (such as Crypt.exe in your case), you need to match their character encoding; in your case, with a Windows console application that uses raw byte handling, the implied encoding is the system's active OEM code page.

  • 发送数据上,PowerShell使用$OutputEncoding首选项变量的编码来 encode (总是被视为文本)数据,默认为Windows PowerShell中为ASCII(!),PowerShell [Core]中为(无BOM)UTF-8.

  • On sending data, PowerShell uses the encoding of the $OutputEncoding preference variable to encode (what is invariably treated as text) data, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell [Core].

默认接收接收端:PowerShell使用[Console]::OutputEncoding(其本身反映了chcp报告的代码页)解码接收到的数据,而在Windows上默认使用反映了Windows PowerShell和PowerShell [Core] [1] 中活动的OEM代码页.

The receiving end is covered by default: PowerShell uses [Console]::OutputEncoding (which itself reflects the code page reported by chcp) for decoding data received, and on Windows this by default reflects the active OEM code page, both in Windows PowerShell and PowerShell [Core][1].

要解决主要问题,因此需要$OutputEncoding设置为有效的OEM代码页:

To fix your primary problem, you therefore need to set $OutputEncoding to the active OEM code page:

# Make sure that PowerShell uses the OEM code page when sending
# data to `.\Crypt.exe`
$OutputEncoding = [Console]::OutputEncoding


问题2 : PowerShell 在将数据通过管道传输到外部程序时,总是将尾随换行符 附加到尚无该行的数据:


Problem 2: PowerShell invariably appends a trailing newline to data that doesn't already have one when piping data to external programs:

也就是说,"foo" | .\Crypt.exe不会将($OutputEncoding编码的字节表示的)"foo"发送到.\Crypt.exe的stdin,而是在Windows上发送"foo`r`n".也就是说,系统会自动且始终附加一个(适合平台的)换行符序列(在Windows上为CRLF)(除非字符串已经碰巧带有尾随的换行符).

That is, "foo" | .\Crypt.exe doesn't send (the $OutputEncoding-encoded bytes representing) "foo" to .\Crypt.exe's stdin, it sends "foo`r`n" on Windows; i.e., a (platform-appropriate) newline sequence (CRLF on Windows) is automatically and invariably appended (unless the string already happens to have a trailing newline).

此GitHub问题此答案.

在您的特定情况下,隐式附加的"`r`n"也会进行字节值移位,这意味着第一个Crypt.exe调用将其转换为-*,从而导致另一个将数据发送到第二个Crypt.exe调用时将添加"`r`n".

In your specific case, the implicitly appended "`r`n" is also subject to the byte-value-shifting, which means that the 1st Crypt.exe calls transforms it to -*, causing another "`r`n" to be appended when the data is sent to the 2nd Crypt.exe call.

最终结果是,多余的换行符是往返的(中间的-*),加上加密的换行符,其结果为φΩ).

The net result is an extra newline that is round-tripped (the intermediate -*), plus an encrypted newline that results in φΩ).

简而言之:如果您输入的数据中没有 no 尾随换行符,则必须从结果中切除最后4个字符(代表往返和意外加密的换行序列):

In short: If your input data had no trailing newline, you'll have to cut off the last 4 characters from the result (representing the round-tripped and the inadvertently encrypted newline sequences):

# Ensure that .\Crypt.exe output is correctly decoded.
$OutputEncoding = [Console]::OutputEncoding

# Invoke the command and capture its output in variable $result.
# Note the use of the `Get-Content` cmdlet; in PowerShell, `type`
# is simply a built-in *alias* for it.
$result = Get-Content .\test.txt | .\Crypt.exe --decrypt | .\Crypt.exe --encrypt

# Remove the last 4 chars. and print the result.
$result.Substring(0, $result.Length - 4)

考虑到答案顶部显示的调用cmd /c也是可行的,这似乎不值得.

Given that calling cmd /c as shown at the top of the answer works too, that hardly seems worth it.

不同于cmd(或类似POSIX的外壳,例如bash):

Unlike cmd (or POSIX-like shells such as bash):

  • PowerShell在管道中不支持原始字节数据 . [2]
  • 外部程序交谈时,它只知道文本 (而交谈时它会通过.NET 对象到PowerShell自己的命令,这就是它的强大功能所在.)
  • PowerShell doesn't support raw byte data in pipelines.[2]
  • When talking to external programs, it only knows text (whereas it passes .NET objects when talking to PowerShell's own commands, which is where much of its power comes from).

具体来说,它的工作方式如下:

Specifically, this works as follows:

  • 当您通过管道将数据 发送到 (到其stdin流)时:

  • When you send data to an external program via the pipeline (to its stdin stream):

  • 使用 $OutputEncoding首选项变量中指定的字符编码将其转换为文本 (字符串),默认情况下在 Windows PowerShell 中转换为ASCII(!),在 PowerShell [Core] 中转换为(无BOM)UTF-8.

  • It is converted to text (strings) using the character encoding specified in the $OutputEncoding preference variable, which defaults to ASCII(!) in Windows PowerShell, and (BOM-less) UTF-8 in PowerShell [Core].

  • 注意事项:如果您将带有BOM表的编码 分配给$OutputEncoding,PowerShell(从v7.0开始)将发出BOM表作为发送到外部程序的 first 输出行的一部分;因此,例如,请勿在Windows PowerShell中使用[System.Text.Encoding]::Utf8(发出BOM),而使用[System.Text.Utf8Encoding]::new($false)(不会发出BOM表).

  • Caveat: If you assign an encoding with a BOM to $OutputEncoding, PowerShell (as of v7.0) will emit the BOM as part of the first line of output sent to an external program; therefore, for instance, do not use [System.Text.Encoding]::Utf8 (which emits a BOM) in Windows PowerShell, and use [System.Text.Utf8Encoding]::new($false) (which doesn't) instead.

如果PowerShell未捕获或重定向数据,则编码问题可能不会总是很明显,也就是说,如果使用Windows Unicode来实现外部程序,控制台API 打印到显示器.

If the data is not captured or redirected by PowerShell, encoding problems may not always become apparent, namely if an external program is implemented in a way that uses the Windows Unicode console API to print to the display.

尚不是文本(字符串)的内容使用PowerShell的默认输出格式(与打印到控制台时看到的格式相同)进行字符串化,并带有重要警告:

Something that isn't already text (a string) is stringified using PowerShell's default output formatting (the same format you see when you print to the console), with an important caveat:

  • 如果(最后一个)输入对象已经是 本身没有 trailing newline 的字符串,则总是添加 (甚至将现有的尾随换行符替换为平台本机换行符(如果有的话)).
  • 此行为可能导致问题,如此GitHub问题以及在此答案.
  • If the (last) input object already is a string that doesn't itself have a trailing newline, one is invariably appended (and even an existing trailing newline is replaced with the platform-native one, if different).
  • This behavior can cause problems, as discussed in this GitHub issue and also in this answer.

当您从外部程序 捕获/重定向数据(从其标准输出流)时,该数据始终被解码为以下内容的行文本 (字符串),基于 [Console]::OutputEncoding 中指定的编码,该编码默认为Windows上处于活动状态的OEM代码页(令人惊讶的是,在两个 PowerShell版本,从v7.0-preview6 [1] 起).

When you capture / redirect data from an external program (from its stdout stream), it is invariably decoded as lines of text (strings), based on the encoding specified in [Console]::OutputEncoding, which defaults to the active OEM code page on Windows (surprisingly, in both PowerShell editions, as of v7.0-preview6[1]).

PowerShell内部使用.NET表示文本. System.String类型,该类型基于UTF-16代码单元(通常比较松散,但错误地称为"Unicode" [3] ).

PowerShell-internally text is represented using the .NET System.String type, which is based on UTF-16 code units (often loosely, but incorrectly called "Unicode"[3]).

以上也适用:

  • 在外部程序之间

  • when piping data between external programs,

数据重定向到文件 时;也就是说,无论数据的来源及其原始字符编码如何,PowerShell在将数据发送到文件时都使用默认编码;在 Windows PowerShell 中,>生成UTF-16LE编码的文件(带有BOM),而PowerShell [Core]明智地默认为无BOM的UTF-8(在文件编写cmdlet中始终保持一致).

when data is redirected to a file; that is, irrespective of the source of the data and its original character encoding, PowerShell uses its default encoding(s) when sending data to files; in Windows PowerShell, > produces UTF-16LE-encoded files (with BOM), whereas PowerShell [Core] sensibly defaults to BOM-less UTF-8 (consistently, across file-writing cmdlets).

添加对在外部程序与文件重定向之间传递的原始数据的支持是 GitHub问题.

Adding support for raw data passing between external programs and to-file redirections is the subject of this GitHub issue.

[1]在PowerShell [Core]中,考虑到$OutputEncoding默认已经默认为UTF-8,将[Console]::OutputEncoding设置为相同是有意义的-即,将活动代码页设置为如此GitHub问题中所述,在Windows上有效地65001.

[1] In PowerShell [Core], given that $OutputEncoding commendably already defaults to UTF-8, it would make sense to have [Console]::OutputEncoding be the same - i.e., for the active code page to be effectively 65001 on Windows, as suggested in this GitHub issue.

[2]使用文件的输入,最接近原始字节处理的是将文件读取为 .NET System.Byte数组使用Get-Content -AsByteStream(PowerShell [Core])/Get-Content -Encoding Byte(Windows PowerShell),但是可以进一步处理诸如数组之类的唯一方法是,将其传递给设计用于执行以下操作的 PowerShell 命令:处理字节数组,或将其传递给需要字节数组的.NET类型的 method .如果您尝试通过管道将这样的数组发送到外部程序,则每个字节将作为其十进制字符串表示形式在自己的行上发送.>.

[2] With input from a file, the closest you can get to raw byte handling is to read the file as a .NET System.Byte array with Get-Content -AsByteStream (PowerShell [Core]) / Get-Content -Encoding Byte (Windows PowerShell), but the only way you can further process such as an array is to pipe to a PowerShell command that is designed to handle a byte array, or by passing it to a .NET type's method that expects a byte array. If you tried to send such an array to an external program via the pipeline, each byte would be sent as its decimal string representation on its own line.

[3] Unicode 是描述全局字母"的抽象 standard 的名称.在具体使用中,它具有各种标准的 encodings ,其中UTF-8和UTF-16是使用最广泛的.

[3] Unicode is the name of the abstract standard describing a "global alphabet". In concrete use, it has various standard encodings, UTF-8 and UTF-16 being the most widely used.

这篇关于通过CMD和PowerShell进行管道传输时,行为和输出不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆