如何使用 Powershell 从 XML 中删除特殊/坏字符 [英] How to Remove Special/Bad Characters from XML Using Powershell
问题描述
我有一个 XML 文件,我想从下面的文件中删除那些十六进制字符错误是无效字符:
I have an XML Files and I want to remove those Hexadecimal Characters Errors from the file below is the invalid characters:
我不知道 STX 是什么意思,当我尝试将它复制到剪贴板并粘贴到 MS Work 时,它显示了一些其他值.
I don't know what does STX means and when i tried copying it to my clipboard and paste it in MS Work it shows some other value.
如何在 powershell 中编写脚本以从我的 XML 文件中删除上述内容.
How can I write a script in powershell to remove the above from my XML file.
推荐答案
以下正则表达式将通过指定否定 XML 文档中整个有效 unicode 条目集的字符类来从 XML 中删除任何无效字符:
The following regex will remove any invalid characters from XML by specifying a character class negating the entire set of valid unicode entries in an XML document:
$rPattern = "[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]"
$xmlText -replace $rPattern,''
这很容易变成一个简单的函数:
function Repair-XmlString
{
[CmdletBinding()]
param(
[Parameter(Mandatory=$true,Position=0)]
[string]$inXML
)
# Match all characters that does NOT belong in an XML document
$rPattern = "[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]"
# Replace said characters with [String]::Empty and return
return [System.Text.RegularExpressions.Regex]::Replace($inXML,$rPattern,"")
}
然后做:
Repair-XmlString (Get-Content path\to\file.xml -Raw) |Set-Content path\to\file.xml
这篇关于如何使用 Powershell 从 XML 中删除特殊/坏字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!