如何使用Powershell扩展文件内容 [英] How to expand file content with powershell

查看：271 发布时间：2020/7/13 19:02:11 powershell variable-expansion file-encodings

本文介绍了如何使用Powershell扩展文件内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想这样做:

$content = get-content "test.html"
$template = get-content "template.html"
$template | out-file "out.html"

其中template.html包含

<html>
  <head>
  </head>
  <body>
    $content
  </body>
</html>

和test.html包含:

<h1>Test Expand</h1>
<div>Hello</div>

我在out.html的前2个字符中得到了奇怪的字符:

    ��

并且内容不会扩展.

如何解决此问题?

解决方案

为 Mathias R. Jessen的有用答案进行补充一种解决方案:

效率更高.
即使输入文件没有(pseudo-) BOM(字节顺序标记).
通过编写一个没有该伪BOM的UTF-8编码输出文件，完全避免了怪异字符"问题.

# Explicitly read the input files as UTF-8, as a whole. $content = get-content -raw -encoding utf8 test.html $template = get-content -raw -encoding utf8 template.html # Write to output file using UTF-8 encoding *without a BOM*. [IO.File]::WriteAllText( "$PWD/out.html", $ExecutionContext.InvokeCommand.ExpandString($template) )

get-content -raw(PSv3 +)将中的文件作为一个整体读取到一个单个字符串中(而不是一个 array (逐行显示)，虽然占用更多的内存，但速度更快.有了HTML文件，就不用担心内存使用情况了.

完整读取文件的另一个优点是，如果模板包含多行子表达式($(...))，则扩展仍将正确运行.

get-content -encoding utf8确保输入文件被解释为使用字符编码UTF-8，这在当今的网络世界中很常见.

这很关键，因为 UTF-8编码的HTML文件通常不具有PowerShell正确识别文件为UTF-所需的3字节伪BOM. 8位编码(请参见下文).

然后，只需一个$ExecutionContext.InvokeCommand.ExpandString()调用就足以执行模板扩展.

Out-File -Encoding utf8将始终使用伪BOM创建文件，这是不希望的.
而是 [IO.File]::WriteAllText() ，利用了默认情况下.NET Framework 创建不带BOM表的 UTF-8编码文件的事实.

请注意在out.html之前使用$PWD/，以确保将文件写入 PowerShell 的当前位置(目录)；不幸的是，.NET Framework认为当前目录是 not 与PowerShell同步的.

最后，强制性安全警告:鉴于可能执行任意嵌入式命令，因此仅在您信任的输入上使用此扩展技术.

可选的背景信息

PowerShell的Out-File，>和>>使用 UTF-16 默认情况下使用 BOM(字节顺序标记)的LE字符编码(怪异字符" ，如之前提到).

Out-File -Encoding utf8 允许创建UTF-8输出文件，
PowerShell 始终会添加一个3字节的伪BOM 到输出文件，其中某些实用程序(尤其是具有Unix传统的实用程序)存在问题--您将仍然获得怪异的字符" (尽管有所不同).

如果您想要一种更类似于PowerShell的方式来创建无BOM的UTF-8文件，参见我的答案此答案，其中定义了Out-FileUtf8NoBom函数，否则该函数将模仿Out-File的核心功能.
相反，在读取文件时，必须使用Get-Content -Encoding utf8来确保将无BOM的UTF-8文件识别为这样.
在没有UTF-8伪BOM的情况下，Get-Content假定文件使用系统旧版代码页 指定的单字节扩展ASCII编码(例如，在英语系统上， Windows-1252 ，PowerShell调用).

请注意，虽然仅Windows的编辑器(如记事本)使用伪BOM( if 明确创建另存为UTF-8；创建了UTF-8文件).是传统的代码页编码(ANSI)，越来越流行的跨平台编辑器，例如 Visual Studio代码，原子和
where template.html contains
<html> <head> </head> <body> $content </body> </html>
and test.html contains:
<h1>Test Expand</h1> <div>Hello</div>
I get weird characters in first 2 characters of out.html :
��
and content is not expanded.

How to fix this ?
解决方案
To complement Mathias R. Jessen's helpful answer with a solution that:

is more efficient.

ensures that the input files are read as UTF-8, even if they don't have a (pseudo-)BOM (byte-order mark).

avoids the "weird character" problem altogether by writing a UTF-8-encoded output file without that pseudo-BOM.

# Explicitly read the input files as UTF-8, as a whole. $content = get-content -raw -encoding utf8 test.html $template = get-content -raw -encoding utf8 template.html # Write to output file using UTF-8 encoding *without a BOM*. [IO.File]::WriteAllText( "$PWD/out.html", $ExecutionContext.InvokeCommand.ExpandString($template) )

get-content -raw (PSv3+) reads the files in as a whole, into a single string (instead of an array of strings, line by line), which, while more memory-intensive, is faster. With HTML files, memory usage shouldn't be a concern.

An additional advantage of reading the files in full is that if the template were to contain multi-line subexpressions ($(...)), the expansion would still function correctly.

get-content -encoding utf8 ensures that the input files are interpreted as using character encoding UTF-8, as is typical in the web world nowadays.

This is crucial, given that UTF-8-encoded HTML files normally do not have the 3-byte pseudo-BOM that PowerShell needs in order to correctly identify a file as UTF-8-encoded (see below).

A single $ExecutionContext.InvokeCommand.ExpandString() call is then sufficient to perform the template expansion.

Out-File -Encoding utf8 would invariably create a file with the pseudo-BOM, which is undesired.
Instead, [IO.File]::WriteAllText() is used, taking advantage of the fact that the .NET Framework by default creates UTF-8-encoded files without the BOM.

Note the use of $PWD/ before out.html, which is needed to ensure that the file gets written in PowerShell's current location (directory); unfortunately, what the .NET Framework considers the current directory is not in sync with PowerShell.

Finally, the obligatory security warning: use this expansion technique only on input that you trust, given that arbitrary embedded commands may get executed.

Optional background information

PowerShell's Out-File, > and >> use UTF-16 LE character encoding with a BOM (byte-order mark) by default (the "weird characters", as mentioned).

While Out-File -Encoding utf8 allows creating UTF-8 output files instead,
PowerShell invariably prepends a 3-byte pseudo-BOM to the output file, which some utilities, notably those with Unix heritage, have problems with - so you would still get "weird characters" (albeit different ones).

If you want a more PowerShell-like way of creating BOM-less UTF-8 files, see this answer of mine, which defines an Out-FileUtf8NoBom function that otherwise emulates the core functionality of Out-File.

Conversely, on reading files, you must use Get-Content -Encoding utf8 to ensure that BOM-less UTF-8 files are recognized as such.
In the absence of the UTF-8 pseudo-BOM, Get-Content assumes that the file uses the single-byte, extended-ASCII encoding specified by the system's legacy codepage (e.g., Windows-1252 on English-language systems, an encoding that PowerShell calls Default).

Note that while Windows-only editors such as Notepad create UTF-8 files with the pseudo-BOM (if you explicitly choose to save as UTF-8; default is the legacy codepage encoding, "ANSI"), increasingly popular cross-platform editors such as Visual Studio Code, Atom, and Sublime Text by default do not use the pseudo-BOM when they create files.

这篇关于如何使用Powershell扩展文件内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Powershell扩展文件内容 [英] How to expand file content with powershell

问题描述

可选的背景信息

Optional background information

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Powershell扩展文件内容 [英] How to expand file content with powershell

问题描述

可选的背景信息

Optional background information

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭