Powershell Regex 替换 XML 标签值 [英] Powershell Regex to replace XML tag values

查看:31
本文介绍了Powershell Regex 替换 XML 标签值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Powershell 从文件中解析以下 XML,但实际上并未使用 [xml] 将其加载为 XML 文档,因为该文档包含错误.

I'm trying to parse following XML from a file using Powershell without actually loading it as XML document using [xml] since the document contain errors.

<data>
  <company>Walter & Cooper</company>
  <contact_name>Patrick O'Brian</contact_name>
</data>

要成功加载文档,我需要通过如下替换特殊字符来修复错误

To load document successfully I need to fix errors by replacing special characters as follows

& with &amp;
< with &lt;
' with &apos; etc..

我知道我可以做这样的事情来查找和替换文档中的字符

I know I could do something like this to find and replace characters in a document

(Get-Content $fileName) | Foreach-Object {
  $_-replace '&', '&amp;' `
    -replace "'", "&apos;" `
    -replace '"', '&quot;'} | Set-Content $fileName

但这将替换文件中任何地方的字符,我只对检查像 <company> 这样的 xml 标签中的字符感兴趣.并将它们替换为 xml 安全实体,以便生成的文本是我可以使用 [xml] 加载的有效文档.

But this will replace characters everywhere in the file, I'm only interest in checking for characters inside xml tags like <company> and replacing them with xml safe entities so that resultant text is a valid document which I can load using [xml].

推荐答案

这样的事情应该适用于您需要替换的每个字符:

Something like this should work for each character you need to replace:

$_-replace '(?<=\W)(&)(?=.*<\/.*>)', '&amp' `
  -replace '(?<=\W)(')(?=.*<\/.*>)', '&apos;' `
  -replace '(?<=\W)(")(?=.*<\/.*>)', '&quot;' `
  -replace '(?<=\W)(>)(?=.*<\/.*>)', '&gt;' `
  -replace '(?<=\W)(\*)(?=.*<\/.*>)', '&lowast;' } | Set-Content $fileName

使用非单词字符进行正向后视,然后是捕获组,然后是正向前视.

which does a positive look-behind with a non-word character, then the capturing group followed by a positive look-ahead.

示例:

更新:http://regex101.com/r/aY8iV3 |原文:http://regex101.com/r/yO7wB1

这篇关于Powershell Regex 替换 XML 标签值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆