DotAll和多行RegEx [英] DotAll and multiline RegEx

查看:136
本文介绍了DotAll和多行RegEx的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在PowerShell中使用Rexex时遇到了一些麻烦。这似乎是有一个imlementation错误或什么的。



我想要使用的文本是一个html文件,它看起来像这样(Example1):

 < span> [手机:%mobile%|]电话:%telephone%[|传真:%faxNumber%]< / span> 
< Span>

问题在于,由html编辑器引起,我也可能得到类似下面的内容(Example2):

 < span> [手机:

%mobile%|]电话:%telephone%[ |传真:& nbsp; %faxNumber%]< / span>

所以,如您所见,我们得到了换行符和html转义,修复了空格& ;

我的Powershell Regex看起来像这样:

  $ x = $ x -ireplace'(?ms)\ [(。?){7} Fax(。*?)\]','MyReplacement1'

和这个

  $ x = $ x -ireplace'(?ms)\ [(。?){7} Mobile(。*?)\]','MyReplacement2'

基本上 [标志变量的开始,] 结束。由此产生两个问题:


  1. 由于我们有两个变量,移动和传真,我使用(。 ?){7} 允许某些(这里只有7个)字符,并避免与第一个[靠近 Mobile 最后一个]附近传真 (如果我将使用(。*?)而不是(?){7} )。我不确定是否有替代方案,以便我可以在起始[和可变关键字传真之间允许任何数字(而不是7个)字符。如果添加了像& nbsp;& nbsp; 这样的东西,那么这将是有用的,以避免错配(只有7个字符不够用,像我说的(。*?)将失败)。希望我能够解释它(很难) - 如果没有,请随时提问!

  2. Powershells -replace方法dosn't提供了设置正则表达式选项的方法,因此我得到了使用(?ms)设置DotAll和多行模式。正如你所看到的,我在我的正则表达式模式中使用它。但是:如果在 Mobile:%mobile%两个字词之间添加了新行, 就像您在example2中看到 ,正则表达式失败并且没有任何东西被替换!

我非常感谢专业人士的任何帮助和正则表达式建议,问题我现在没有考虑到......



编辑:
(Example3):

 < span> [手机:

%mobile%|]电话:%telephone%[|传真:
%faxNumber%]< / span>


解决方案

DotAll模式的诀窍是使用 [\s\S] 而不是。此字符类与任何字符匹配(因为它匹配空格和非空格字符)。 (如 [\ w \W] [\ d \ D] ,但空格)

为了避开 7 ,你可以简单地禁止关闭] 之前,你真的想匹配(这也使得DotAll不必要)。所以像这样的东西应该适用于你:

  \ [([^ \]:] *)Fax([ ^ \]] *)\] 

它看起来有点难看, :

  \ [#literal [
(#捕获组1
[^ \]: ]##尽可能多地与非:,非]字符匹配
)#组尾1#$ b $传真#文字传真
(#捕获组2
[^ \\ \\]] *#匹配尽可能多的非]字符
)#组2的结尾
\]#literal]

进一步阅读角色类。



请注意,这些模式都不需要多行模式 m (既不是你的也不是我的),因为它所做的只是make ^ $ 分别匹配行的开始和结束。但是没有一个模式包含这些元字符。



我的控制台输出:

  PS> $ x =< span> [手机:%mobile%|]电话:%telephone%[|传真:& nbsp;%faxNumber%]< / span> 
PS> $ x -ireplace'\ [([^ \]:] *)Mobile([^ \]] *)\]','MyReplacement1'
MyReplacement1电话:%telephone% [|传真:& nbsp; %faxNumber%]< / span>
PS> $ x -ireplace'\ [([^ \]:] *)传真([^ \]] *)\]','MyReplacement2'
< span> [Mobile:%mobile% |]电话:%telephone%MyReplacement2< / span>


i got a little trouble using Rexex in Powershell. It seems like there is a imlementation error or something.

The text i want to work with is a html file, which looks like this (Example1):

<span>[Mobile: %mobile% |] Phone: %telephone% [| Fax: %faxNumber%]</span>
<Span>

The Problem is that, caused by html editors, i also may get something like this (Example2):

<span>[Mobile: 

%mobile% |] Phone: %telephone% [| Fax: &nbsp;&nbsp;%faxNumber%]</span>

So as you see, we got linebreaks and html escaped, fixed whitespaces &nbsp;.

My Powershell Regex looks like this:

$x = $x -ireplace '(?ms)\[(.?){7}Fax(.*?)\]', 'MyReplacement1'

and this

$x = $x -ireplace '(?ms)\[(.?){7}Mobile(.*?)\]', 'MyReplacement2'

Basicly The [ marks the beginning of a variable and ] the end of it. Two problems arise from this:

  1. Since we got two variables, mobile and fax, i'm using (.?){7} to allow SOME (here exacly 7) characters and avoid matching the hole part between the first [ near Mobile and the last ] near Fax (which would happen if i would be using (.*?) instead of (.?){7}). I'm not sure if there are alternatives so that i can allow ANY number (and not 7) of chars between the starting [ and the variable keyword "Fax" for example. This would be usefull to avoid missmatches when stuff like &nbsp;&nbsp; gets added (where only 7 char would not be enough and like i said (.*?) will fail). Hope i was able to explain it (kinda hard) - if not: please feel free to ask!
  2. Powershells -replace method dosn't offer a way to set regex options, therefore i got to use (?ms) to set DotAll and multiline modes. As you see, I'm using it within my regex pattern. However: when a newline is added, as you see in example2 between the words Mobile: and %mobile%, the regex fails and nothing gets replaced!

I'm greatfull for any help and even regex recommandations from the pros to avoid any further problems i'm not thinking about right now...

EDIT: (Example3):

<span>[Mobile: 

%mobile% |] Phone: %telephone% [| Fax: 
%faxNumber%]</span>

解决方案

The trick around DotAll mode is to use [\s\S] instead of .. This character class matches any character (because it matches space and non-space characters). (As does [\w\W] or [\d\D], but the spaces seem to be kind of a convention.)

To get around the 7 you can simply disallow closing ] before the one you actually want to match (that by the way also makes DotAll unnecessary). So something like this should work fine for you:

\[([^\]:]*)Fax([^\]]*)\]

It looks a bit ugly, but it simply means this:

\[        # literal [
(         # capturing group 1
  [^\]:]* # match as many non-:, non-] characters as possible
)         # end of group 1
Fax       # literal Fax
(         # capturing group 2
  [^\]]*  # match as many non-] characters as possible
)         # end of group 2
\]        # literal ]

Further reading on character classes.

Note that none of these patterns need multiline mode m (neither yours nor mine), because all it does is make ^ and $ match line beginnings and endings, respectively. But none of the patterns contain these meta-characters. So the modifier does not do anything.

My console output:

PS> $x = "<span>[Mobile: %mobile% |] Phone: %telephone% [| Fax: &nbsp;&nbsp;%faxNumber%]</span>"
PS> $x -ireplace '\[([^\]:]*)Mobile([^\]]*)\]', 'MyReplacement1'
<span>MyReplacement1 Phone: %telephone% [| Fax: &nbsp;&nbsp;%faxNumber%]</span>
PS> $x -ireplace '\[([^\]:]*)Fax([^\]]*)\]', 'MyReplacement2'
<span>[Mobile: %mobile% |] Phone: %telephone% MyReplacement2</span>

这篇关于DotAll和多行RegEx的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆