DotAll和多行RegEx [英] DotAll and multiline RegEx
问题描述
我想要使用的文本是一个html文件,它看起来像这样(Example1):
< span> [手机:%mobile%|]电话:%telephone%[|传真:%faxNumber%]< / span>
< Span>
问题在于,由html编辑器引起,我也可能得到类似下面的内容(Example2):
< span> [手机:
%mobile%|]电话:%telephone%[ |传真:& nbsp; %faxNumber%]< / span>
所以,如您所见,我们得到了换行符和html转义,修复了空格 我的Powershell Regex看起来像这样: 和这个 基本上 [标志变量的开始,] 结束。由此产生两个问题: 我非常感谢专业人士的任何帮助和正则表达式建议,问题我现在没有考虑到...... 编辑: DotAll模式的诀窍是使用 为了避开 它看起来有点难看, : 请注意,这些模式都不需要多行模式 我的控制台输出: i got a little trouble using Rexex in Powershell. It seems like there is a imlementation error or something. The text i want to work with is a html file, which looks like this (Example1): The Problem is that, caused by html editors, i also may get something like this (Example2): So as you see, we got linebreaks and html escaped, fixed whitespaces My Powershell Regex looks like this: and this Basicly The [ marks the beginning of a variable and ] the end of it. Two problems arise from this: I'm greatfull for any help and even regex recommandations from the pros to avoid any further problems i'm not thinking about right now... EDIT:
(Example3):
The trick around DotAll mode is to use To get around the It looks a bit ugly, but it simply means this: Further reading on character classes. Note that none of these patterns need multiline mode My console output:
这篇关于DotAll和多行RegEx的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!& ;
$ x = $ x -ireplace'(?ms)\ [(。?){7} Fax(。*?)\]','MyReplacement1'
$ x = $ x -ireplace'(?ms)\ [(。?){7} Mobile(。*?)\]','MyReplacement2'
(。 ?){7}
允许某些(这里只有7个)字符,并避免与第一个[靠近 Mobile 和最后一个]附近传真 (如果我将使用(。*?)
而不是(?){7}
)。我不确定是否有替代方案,以便我可以在起始[和可变关键字传真之间允许任何数字(而不是7个)字符。如果添加了像& nbsp;& nbsp;
这样的东西,那么这将是有用的,以避免错配(只有7个字符不够用,像我说的(。*?)
将失败)。希望我能够解释它(很难) - 如果没有,请随时提问!
(Example3):
< span> [手机:
%mobile%|]电话:%telephone%[|传真:
%faxNumber%]< / span>
[\s\S]
而不是。
。此字符类与任何字符匹配(因为它匹配空格和非空格字符)。 (如 [\ w \W]
或 [\ d \ D]
,但空格)
7
,你可以简单地禁止关闭]
之前,你真的想匹配(这也使得DotAll不必要)。所以像这样的东西应该适用于你:
\ [([^ \]:] *)Fax([ ^ \]] *)\]
\ [#literal [
(#捕获组1
[^ \]: ]##尽可能多地与非:,非]字符匹配
)#组尾1#$ b $传真#文字传真
(#捕获组2
[^ \\ \\]] *#匹配尽可能多的非]字符
)#组2的结尾
\]#literal]
m
(既不是你的也不是我的),因为它所做的只是make ^
和 $
分别匹配行的开始和结束。但是没有一个模式包含这些元字符。
PS> $ x =< span> [手机:%mobile%|]电话:%telephone%[|传真:& nbsp;%faxNumber%]< / span>
PS> $ x -ireplace'\ [([^ \]:] *)Mobile([^ \]] *)\]','MyReplacement1'
MyReplacement1电话:%telephone% [|传真:& nbsp; %faxNumber%]< / span>
PS> $ x -ireplace'\ [([^ \]:] *)传真([^ \]] *)\]','MyReplacement2'
< span> [Mobile:%mobile% |]电话:%telephone%MyReplacement2< / span>
<span>[Mobile: %mobile% |] Phone: %telephone% [| Fax: %faxNumber%]</span>
<Span>
<span>[Mobile:
%mobile% |] Phone: %telephone% [| Fax: %faxNumber%]</span>
.$x = $x -ireplace '(?ms)\[(.?){7}Fax(.*?)\]', 'MyReplacement1'
$x = $x -ireplace '(?ms)\[(.?){7}Mobile(.*?)\]', 'MyReplacement2'
(.?){7}
to allow SOME (here exacly 7) characters and avoid matching the hole part between the first [ near Mobile and the last ] near Fax (which would happen if i would be using (.*?)
instead of (.?){7}
). I'm not sure if there are alternatives so that i can allow ANY number (and not 7) of chars between the starting [ and the variable keyword "Fax" for example. This would be usefull to avoid missmatches when stuff like
gets added (where only 7 char would not be enough and like i said (.*?)
will fail). Hope i was able to explain it (kinda hard) - if not: please feel free to ask!<span>[Mobile:
%mobile% |] Phone: %telephone% [| Fax:
%faxNumber%]</span>
[\s\S]
instead of .
. This character class matches any character (because it matches space and non-space characters). (As does [\w\W]
or [\d\D]
, but the spaces seem to be kind of a convention.)7
you can simply disallow closing ]
before the one you actually want to match (that by the way also makes DotAll unnecessary). So something like this should work fine for you:\[([^\]:]*)Fax([^\]]*)\]
\[ # literal [
( # capturing group 1
[^\]:]* # match as many non-:, non-] characters as possible
) # end of group 1
Fax # literal Fax
( # capturing group 2
[^\]]* # match as many non-] characters as possible
) # end of group 2
\] # literal ]
m
(neither yours nor mine), because all it does is make ^
and $
match line beginnings and endings, respectively. But none of the patterns contain these meta-characters. So the modifier does not do anything.PS> $x = "<span>[Mobile: %mobile% |] Phone: %telephone% [| Fax: %faxNumber%]</span>"
PS> $x -ireplace '\[([^\]:]*)Mobile([^\]]*)\]', 'MyReplacement1'
<span>MyReplacement1 Phone: %telephone% [| Fax: %faxNumber%]</span>
PS> $x -ireplace '\[([^\]:]*)Fax([^\]]*)\]', 'MyReplacement2'
<span>[Mobile: %mobile% |] Phone: %telephone% MyReplacement2</span>