RegEx:如何匹配非HTML标签或Special Char ...的字符串? [英] RegEx : How to match string that is not HTML tag or Special Char...?

查看:84
本文介绍了RegEx:如何匹配非HTML标签或Special Char ...的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 Hai,

我有一些HTML文本。当我显示我想突出显示一些关键字。我不想匹配,如果它是html标签的一部分或任何特殊字符,如& nbsp;

例如:
我的HTML文字:< span > Hello&#160;&#160;欢迎来到我的160号水疗中心< / span >

我的关键字:spa 160

突出显示我使用< span class = 突出显示 > keyword < / span >

但现在它匹配标签< span > 和160内的特殊内容char&#160;

如何克服这个...... ???我使用C#RegEx。

我需要一个与关键字匹配但不在标签或特殊字符中的RegEx。

预付谢谢。

解决方案

你想要的是负面的看法:

(?<! < /?[^>] * |& [^;] *)(\ b160 \b | \bspa \ b)



和替换为

< span class =highlight> 


1< / span>



负面的lookbehind语法是:(?<!...),表示关键字不能以某种模式开头。在这种情况下,该模式可以是标记< /?[^>] * 的开头,也可以是HTML实体的开头& [^;] * 尚未完成。



< /?[^>] * 表示一个开括号,可能后面跟一个斜线,后跟任意数量的非近括号的字符。



& [^;] * 表示一个&符号,后跟任意数量的不是分号的字符。



这是怎么回事将其合并到您的C#代码中:

  string  [] keywords = {  spa  160 无论什么}; 
Regex.Replace(htmlContent, (?<!< /?[^>] * |& [^;] *)(\ b​​ + string .Join( \b | \b,关键字)+ \b) < span class = \highlight \ >


1< / span>,RegexOptions.IgnoreCase);



编辑:我将Andreas Gieriet的优点纳入其中 - 您需要确保只通过将字边界与 \b 匹配来匹配完整的单词。


Hai,

I have some HTML Text. When i display that i want to highlight some keywords.  I dont want to match if that is a part of html tag or any special characters like &nbsp;

for eg :
My HTML Text : <span>Hello&#160;&#160;Welcome to my Spa No. 160</span>

my keywords : spa 160

for highlighting i use <span class="highlight">keyword</span>

But now its matching the spa inside the tag <span> and 160 inside the special char &#160;

How to overcome this...??? I use C# RegEx.

I need a RegEx that matches the keyword but not in tags or special characters.

Advance thank you. 

解决方案

What you want is negative lookbehinds:

(?<!</?[^>]*|&[^;]*)(\b160\b|\bspa\b)


and replace with

<span class="highlight">


1</span>


The negative lookbehind syntax is: (?<! ... ), which indicates that the keyword cannot be preceded by a certain pattern. That pattern in this case is either the beginning of a tag </?[^>]* or the beginning of an HTML entity &[^;]* that isn't complete.

</?[^>]* indicates an open bracket, possibly followed by a slash, followed by any number of chars that aren't close brackets.

&[^;]* indicates an ampersand followed by any number of chars that aren't semicolons.

Here's how to incorporate this into your C# code:

string[] keywords = { "spa", "160", "whatever" };
Regex.Replace(htmlContent, "(?<!</?[^>]*|&[^;]*)(\b" + string.Join("\b|\b", keywords) + "\b)", "<span class=\"highlight\">


1</span>", RegexOptions.IgnoreCase);


EDIT: I incorporated the good point made by Andreas Gieriet - that you need to ensure you are matching complete "words" only by matching word boundaries with \b.


这篇关于RegEx:如何匹配非HTML标签或Special Char ...的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆