为什么Powershell文件串联会将UTF8转换为UTF16? [英] Why does Powershell file concatenation convert UTF8 to UTF16?
问题描述
我正在运行以下Powershell脚本,将一系列输出文件连接为一个CSV文件. whidataXX.htm
(其中xx
是两位数字序号),创建的文件数因运行而异.
I am running the following Powershell script to concatenate a series of output files into a single CSV file. whidataXX.htm
(where xx
is a two digit sequential number) and the number of files created varies from run to run.
$metadataPath = "\\ServerPath\foo"
function concatenateMetadata {
$cFile = $metadataPath + "whiconcat.csv"
Clear-Content $cFile
$metadataFiles = gci $metadataPath
$iterations = $metadataFiles.Count
for ($i=0;$i -le $iterations-1;$i++) {
$iFile = "whidata"+$i+".htm"
$FileExists = (Test-Path $metadataPath$iFile -PathType Leaf)
if (!($FileExists))
{
break
}
elseif ($FileExists)
{
Write-Host "Adding " $metadataPath$iFile
Get-Content $metadataPath$iFile | Out-File $cFile -append
Write-Host "to" $cfile
}
}
}
whidataXX.htm
文件编码为UTF8,但是我的输出文件编码为UTF16.在记事本中查看文件时,它看起来正确,但是在十六进制编辑器中查看时,十六进制值00
出现在每个字符之间,并且当我将文件拉入Java程序进行处理时,文件将打印到在控制台上,在c h a r a c t e r s
之间留有多余的空格.
The whidataXX.htm
files are encoded UTF8, but my output file is encoded UTF16. When I view the file in Notepad, it appears correct, but when I view it in a Hex Editor, the Hex value 00
appears between each character, and when I pull the file into a Java program for processing, the file prints to the console with extra spaces between c h a r a c t e r s
.
首先,PowerShell是否正常?还是源文件中会导致这种情况?
First, is this normal for PowerShell? or is there something in the source files that would cause this?
第二,如何解决上面提到的代码中的编码问题?
Second, how would I fix this encoding problem in the code noted above?
推荐答案
Out- * cmdlet(如Out-File)格式化数据,默认格式为unicode.
The Out-* cmdlets (like Out-File) format the data, and the default format is unicode.
您可以在输出文件中添加-Encoding参数:
You can add an -Encoding parameter to Out-file:
Get-Content $metadataPath$iFile | Out-File $cFile -Encoding UTF8 -append
或切换到不会重新格式化的添加内容
or switch to Add-Content, which doesn't re-format
Get-Content $metadataPath$iFile | Add-Content $cFile
这篇关于为什么Powershell文件串联会将UTF8转换为UTF16?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!