RegEx:如何匹配非HTML标签或Special Char ...的字符串? [英] RegEx : How to match string that is not HTML tag or Special Char...?
问题描述
Hai,
我有一些HTML文本。当我显示我想突出显示一些关键字。我不想匹配,如果它是html标签的一部分或任何特殊字符,如& nbsp;
例如:
我的HTML文字:< span > Hello&#160;&#160;欢迎来到我的160号水疗中心< / span >
我的关键字:spa 160
突出显示我使用< span class = 突出显示 > keyword < / span >
但现在它匹配标签< span > 和160内的特殊内容char&#160;
如何克服这个...... ???我使用C#RegEx。
我需要一个与关键字匹配但不在标签或特殊字符中的RegEx。
预付谢谢。
你想要的是负面的看法:
(?<! < /?[^>] * |& [^;] *)(\ b160 \b | \bspa \ b)
和替换为
< span class =highlight>
1< / span>
负面的lookbehind语法是:(?<!...)
,表示关键字不能以某种模式开头。在这种情况下,该模式可以是标记< /?[^>] *
的开头,也可以是HTML实体的开头& [^;] *
尚未完成。
< /?[^>] *
表示一个开括号,可能后面跟一个斜线,后跟任意数量的非近括号的字符。
& [^;] *
表示一个&符号,后跟任意数量的不是分号的字符。
这是怎么回事将其合并到您的C#代码中:
string [] keywords = { spa, 160, 无论什么};
Regex.Replace(htmlContent, (?<!< /?[^>] * |& [^;] *)(\ b + string .Join( \b | \b,关键字)+ \b), < span class = \highlight \ >
1< / span>,RegexOptions.IgnoreCase);
编辑:我将Andreas Gieriet的优点纳入其中 - 您需要确保只通过将字边界与\b
匹配来匹配完整的单词。
Hai,
I have some HTML Text. When i display that i want to highlight some keywords. I dont want to match if that is a part of html tag or any special characters like
for eg :
My HTML Text : <span>Hello  Welcome to my Spa No. 160</span>
my keywords : spa 160
for highlighting i use <span class="highlight">keyword</span>
But now its matching the spa inside the tag <span> and 160 inside the special char  
How to overcome this...??? I use C# RegEx.
I need a RegEx that matches the keyword but not in tags or special characters.
Advance thank you.
What you want is negative lookbehinds:
(?<!</?[^>]*|&[^;]*)(\b160\b|\bspa\b)
and replace with
<span class="highlight">
1</span>
The negative lookbehind syntax is:(?<! ... )
, which indicates that the keyword cannot be preceded by a certain pattern. That pattern in this case is either the beginning of a tag</?[^>]*
or the beginning of an HTML entity&[^;]*
that isn't complete.
</?[^>]*
indicates an open bracket, possibly followed by a slash, followed by any number of chars that aren't close brackets.
&[^;]*
indicates an ampersand followed by any number of chars that aren't semicolons.
Here's how to incorporate this into your C# code:
string[] keywords = { "spa", "160", "whatever" }; Regex.Replace(htmlContent, "(?<!</?[^>]*|&[^;]*)(\b" + string.Join("\b|\b", keywords) + "\b)", "<span class=\"highlight\">
1</span>", RegexOptions.IgnoreCase);
EDIT: I incorporated the good point made by Andreas Gieriet - that you need to ensure you are matching complete "words" only by matching word boundaries with\b
.
这篇关于RegEx:如何匹配非HTML标签或Special Char ...的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!