使用PowerShell正确读取UTF-8文件 [英] Read UTF-8 files correctly with PowerShell

查看:823
本文介绍了使用PowerShell正确读取UTF-8文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下情况:

  • PowerShell脚本使用UTF-8编码创建文件
  • 用户可能会或可能不会编辑文件,可能会丢失BOM,但应将编码保持为UTF-8,并可能更改行分隔符
  • 相同的PowerShell脚本读取文件,添加更多内容,并将所有内容以UTF-8格式写回到同一文件中
  • 这可以重复很多次

使用Get-ContentOut-File -Encoding UTF8时,我无法正确读取它.它使之前编写的BOM陷入困境(将其放入内容中,破坏了我的正则表达式),不使用UTF-8编码,甚至删除了原始内容部分中的换行符.

With Get-Content and Out-File -Encoding UTF8 I have problems reading it correctly. It's stumbling over the BOM it has written before (putting it in the content, breaking my parsing regex), does not use UTF-8 encoding and even deletes line breaks in the original content part.

我需要一个函数,该函数可以读取具有UTF-8编码的任何文件,忽略和删除BOM且不修改内容.我该怎么用?

I need a function that can read any file with UTF-8 encoding, ignore and delete the BOM and not modify the content. What should I use?

更新

我添加了一个小测试脚本,该脚本显示了我正在尝试做什么以及发生了什么.

I have added a little test script that shows what I'm trying to do and what happens instead.

# Read data if exists
$data = ""
$startRev = 1;
if (Test-Path test.txt)
{
    $data = Get-Content -Path test.txt
    if ($data -match "^[0-9-]{10} - r([0-9]+)")
    {
        $startRev = [int]$matches[1] + 1
    }
}
Write-Host Next revision is $startRev

# Define example data to add
$startRev = $startRev + 10
$newMsgs = "2014-04-01 - r" + $startRev + "`r`n`r`n" + `
    "Line 1`r`n" + `
    "Line 2`r`n`r`n"

# Write new data back
$data = $newMsgs + $data
$data | Out-File test.txt -Encoding UTF8

运行几次后,应将新节添加到文件的开头,不应以任何方式更改现有内容(当前丢失换行符),并且不应在末尾添加其他新行文件(有时可能会发生).

After running it a few times, new sections should be added to the beginning of the file, the existing content should not be altered in any way (currently loses line breaks) and no additional new lines should be added at the end of the file (seems to happen sometimes).

相反,第二次运行给我一个错误.

Instead, the second run gives me an error.

推荐答案

如果文件应该是UTF8,为什么不尝试将其解码为UTF8呢?

If the file is supposed to be UTF8 why don't you try to read it decoding UTF8 :

Get-Content -Path test.txt -Encoding UTF8

这篇关于使用PowerShell正确读取UTF-8文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆