Powershell Regex 替换 XML 标签值 [英] Powershell Regex to replace XML tag values
问题描述
我正在尝试使用 Powershell 从文件中解析以下 XML,但实际上并未使用 [xml] 将其加载为 XML 文档,因为该文档包含错误.
I'm trying to parse following XML from a file using Powershell without actually loading it as XML document using [xml] since the document contain errors.
<data>
<company>Walter & Cooper</company>
<contact_name>Patrick O'Brian</contact_name>
</data>
要成功加载文档,我需要通过如下替换特殊字符来修复错误
To load document successfully I need to fix errors by replacing special characters as follows
& with &
< with <
' with ' etc..
我知道我可以做这样的事情来查找和替换文档中的字符
I know I could do something like this to find and replace characters in a document
(Get-Content $fileName) | Foreach-Object {
$_-replace '&', '&' `
-replace "'", "'" `
-replace '"', '"'} | Set-Content $fileName
但这将替换文件中任何地方的字符,我只对检查像 <company> 这样的 xml 标签中的字符感兴趣.并将它们替换为 xml 安全实体,以便生成的文本是我可以使用 [xml] 加载的有效文档.
But this will replace characters everywhere in the file, I'm only interest in checking for characters inside xml tags like <company> and replacing them with xml safe entities so that resultant text is a valid document which I can load using [xml].
推荐答案
这样的事情应该适用于您需要替换的每个字符:
Something like this should work for each character you need to replace:
$_-replace '(?<=\W)(&)(?=.*<\/.*>)', '&' `
-replace '(?<=\W)(')(?=.*<\/.*>)', ''' `
-replace '(?<=\W)(")(?=.*<\/.*>)', '"' `
-replace '(?<=\W)(>)(?=.*<\/.*>)', '>' `
-replace '(?<=\W)(\*)(?=.*<\/.*>)', '∗' } | Set-Content $fileName
使用非单词字符进行正向后视,然后是捕获组,然后是正向前视.
which does a positive look-behind with a non-word character, then the capturing group followed by a positive look-ahead.
示例:
更新:http://regex101.com/r/aY8iV3 |原文:http://regex101.com/r/yO7wB1
这篇关于Powershell Regex 替换 XML 标签值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!