RegEx：如何匹配非HTML标签或Special Char ...的字符串？ [英] RegEx : How to match string that is not HTML tag or Special Char...?

查看：84 发布时间：2019/6/15 10:41:47 C# ASP.NET regular-expression

本文介绍了RegEx：如何匹配非HTML标签或Special Char ...的字符串？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

 Hai，
 
我有一些HTML文本。当我显示我想突出显示一些关键字。我不想匹配，如果它是html标签的一部分或任何特殊字符，如& nbsp;  
 
例如：
我的HTML文字：<   span  >  Hello&＃160;&＃160;欢迎来到我的160号水疗中心<   / span  >  
 
我的关键字：spa 160 
 
突出显示我使用<   span     class   = 突出显示 >  keyword <   / span  >  
 
但现在它匹配标签<   span  > 和160内的特殊内容char&＃160; 
 
如何克服这个...... ???我使用C＃RegEx。 
 
我需要一个与关键字匹配但不在标签或特殊字符中的RegEx。 
 
预付谢谢。

解决方案

你想要的是负面的看法：
（？<！ < /？[^>] * |& [^;] *）（\ b160 \b | \bspa \ b）
和替换为

1

负面的lookbehind语法是：（？<！...），表示关键字不能以某种模式开头。在这种情况下，该模式可以是标记< /？[^>] * 的开头，也可以是HTML实体的开头& [^;] * 尚未完成。

< /？[^>] * 表示一个开括号，可能后面跟一个斜线，后跟任意数量的非近括号的字符。

& [^;] * 表示一个&符号，后跟任意数量的不是分号的字符。

这是怎么回事将其合并到您的C＃代码中：
 string [] keywords = { spa， 160， 无论什么}; 
 Regex.Replace（htmlContent， （？<！< /？[^>] * |& [^;] *）（\ b + string .Join（ \b | \b，关键字）+ \b），

1，RegexOptions.IgnoreCase）;

编辑：我将Andreas Gieriet的优点纳入其中 - 您需要确保只通过将字边界与 \b 匹配来匹配完整的单词。

Hai,

I have some HTML Text. When i display that i want to highlight some keywords.  I dont want to match if that is a part of html tag or any special characters like &nbsp;

for eg :
My HTML Text : <span>Hello&#160;&#160;Welcome to my Spa No. 160</span>

my keywords : spa 160

for highlighting i use <span class="highlight">keyword</span>

But now its matching the spa inside the tag <span> and 160 inside the special char &#160;

How to overcome this...??? I use C# RegEx.

I need a RegEx that matches the keyword but not in tags or special characters.

Advance thank you.

解决方案

What you want is negative lookbehinds:
(?<!</?[^>]*|&[^;]*)(\b160\b|\bspa\b)
and replace with

1

The negative lookbehind syntax is: (?<! ... ), which indicates that the keyword cannot be preceded by a certain pattern. That pattern in this case is either the beginning of a tag </?[^>]* or the beginning of an HTML entity &[^;]* that isn't complete.

</?[^>]* indicates an open bracket, possibly followed by a slash, followed by any number of chars that aren't close brackets.

&[^;]* indicates an ampersand followed by any number of chars that aren't semicolons.

Here's how to incorporate this into your C# code:
string[] keywords = { "spa", "160", "whatever" };
Regex.Replace(htmlContent, "(?<!</?[^>]*|&[^;]*)(\b" + string.Join("\b|\b", keywords) + "\b)", "

1", RegexOptions.IgnoreCase);

EDIT: I incorporated the good point made by Andreas Gieriet - that you need to ensure you are matching complete "words" only by matching word boundaries with \b.

这篇关于RegEx：如何匹配非HTML标签或Special Char ...的字符串？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

RegEx：如何匹配非HTML标签或Special Char ...的字符串？ [英] RegEx : How to match string that is not HTML tag or Special Char...?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

RegEx：如何匹配非HTML标签或Special Char ...的字符串？ [英] RegEx : How to match string that is not HTML tag or Special Char...?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭