使用 Powershell 替换特定字符串中的字符 [英] Use Powershell to replace characters within a specific string

查看:80
本文介绍了使用 Powershell 替换特定字符串中的字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Powershell 脚本来自动替换 xml 文件中的一些麻烦字符,例如 &' - 英镑

I'm using a Powershell script to automate the replacement of some troublesome characters from an xml file such as & ' - £

我的脚本对这些字符很有效,但我也想删除双引号字符 " 但前提是它在 xml 属性中使用(不幸的是用双引号括起来)所以我显然不能删除所有双引号xml 文件中的引号,因为这将阻止属性正常工作.

The script I have works well for these characters, but I also want to remove the double quote character " but only if it is used within an xml attribute (which unfortunately is enclosed by double quotes) so I obviously cannot remove all double quotes from the xml file as this will stop the attributes from working as they should.

我的 Powershell 脚本如下:

My Powershell script is below:

(Get-Content C:\test\communication.xml) | 
Foreach-Object {$_ -replace "&", "+" -replace "£", "GBP" -replace "'", "" -replace "–", " "} |
Set-Content C:\test\communication.xml

我希望能够这样做的是仅删除组成部分 XML 属性的双引号,这些属性本身由一对双引号括起来,如下所示.我知道 Powershell 将每一行视为一个单独的对象,因此怀疑这应该很容易,可能是通过使用条件?

What I'd like to be able to so is to remove ONLY the double quotes that make up part the XML attributes that are themselves enclosed by a pair of double quotes as below. I know that Powershell looks at each line as a separate object so suspect this should be quite easy, possibly by using conditions?

示例 XML 文件如下:

An example XML file is below:

<?xml version="1.0" encoding="UTF-8"?>
<Portal> 
<communication updates="Text data with no double quotes in the attribute" />
<communication updates="Text data that "includes" double quotes within the double quotes for the attribute" />
</Portal>

在上面的示例中,我只想删除紧邻单词包括的双引号,但不删除位于单词 Text 左侧或单词属性右侧的双引号.用于 XML 属性的词会定期更改,但左侧双引号始终位于 = 符号的右侧,右侧双引号始终位于空格正斜杠组合/的左侧谢谢

In the above example I'd like to remove only the double quotes that immediately surround the word includes BUT not the double quotes that are to the left of the word Text or to the right of the word attribute. The words used for the XML attributes will change on a regular basis but the left double quote will always be to the immediate right of the = symbol and the right double quote will always be to the left of a space forward slash combination / Thanks

推荐答案

试试这个正则表达式:

"(?<!\?xml.*)(?<=`".*?)`"(?=.*?`")"

在您的代码中:

(Get-Content C:\test\communication.xml) | 
Foreach-Object {$_ -replace "&", "+" `
    -replace "£", "GBP" `
    -replace "'", "" `
    -replace "–", " " `
    -replace "(?<!\?xml.*)(?<=`".*?)`"(?=.*?`")", ""} |
Set-Content C:\test\communication.xml

这将采用任何在其前后都有 ""(除了其中有 ?xml 的行)) 并将其替换为空.

This will take any " that has a " in-front of and behind it (except a line that has ?xml in it) and replace it with nothing.

编辑以包含正则表达式的细分;

Edit to include breakdown of regex;

(?<!\?xml.*)(?<=`".*?)`"(?=.*?`")

1. (?<!\?xml.*)----> Excludes any line that has "?xml" before the first quote
2. (?<=`".*?)------> Lookbehind searching for a quotation mark.  
       The ` is to escape the quotation mark, which is needed for powershell
3. `"--------------> The actual quotation mark you are searching for
4. (?=.*?`")-------> Lookahead searching for a quotation mark

有关lookbehinds和lookaheads的更多信息查看此站点

For more information about lookbehinds and lookaheads see this site

这篇关于使用 Powershell 替换特定字符串中的字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆