在C#中使用正则表达式突出显示html中的单词 [英] highlight words in html using regex in C#

查看:53
本文介绍了在C#中使用正则表达式突出显示html中的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在stackoverflow上找到了这篇文章

I found this article on stackoverflow

使用regex&在html中突出显示的单词javascript-差不多在那里

使用上面的文章,我试图使用c#在服务器上突出显示HTML文本.代码如下所示:

Using the article above, I am trying to highlight HTML text on the server using c#. The code is shown below:

string replacePattern = "$1<span style=\"background-color:yellow\">$2</span>";
string searchPattern = String.Format("(?<=^|>)(.*?)({0})(?=.*?<|$)", searchString.Trim());
content = Regex.Replace(content, searchPattern, replacePattern, RegexOptions.IgnoreCase);

除了试图突出显示图像源中包含的单词时,该代码似乎很好用:

The code seems to work great except when trying to highlight a word that is contained in an image source:

搜索关键字:

ABC

搜索文字:

<div><img src="/site/folder/ABC.PNG" /><br />ABC</div>

结果将同时突出显示文本和图像名称.

The result will highlight both the text and the image name.

任何帮助将不胜感激.

推荐答案

我将提供一个解决方案,但是我同意,仅使用Regex来解析HTML最终是不值得的.就是说,您比我们其他人对问题空间了解更多,因此,如果您要突出显示的HTML在您的控制之下,则您可以测试足够的域以使用正则表达式来实现所需的功能.

I'll offer up a solution, but I agree that solely using Regex for parsing HTML can eventually not be worth the effort. That said, you know more about your problem space than the rest of us, so if the HTML you're highlighting is under your control you may be able to test enough of your domain to achieve what you want with regexes.

我的解决方案更改了您提供的正则表达式以采用这种方法:

My solution changes the regex you've supplied to take this approach:

  1. 将不位于集合[<>]中的char>非贪婪捕获char匹配并捕获到$ 1中.
  2. 匹配关键字并将其捕获到$ 2
  3. 匹配并捕获到未设置为[<>]的$ 3个非贪心字符中,并加上<字符

注意事项:

  1. 格式正确的HTML效果最好,如果该html是用户生成的内容(UGC),那么,好运,您应该使用HTML解析器:)
  2. 这将突出显示< textarea> ...</textarea>
  3. 中的内容
  4. 这将突出显示< script> ...</script>
  5. 中的内容
  1. well-formed HTML works best, if this html is User-Generated content (UGC), then, good luck you should've used an HTML parser :)
  2. this would highlight content within <textarea>...</textarea>
  3. this would highlight content within <script>...</script>

请注意,您可以在左侧扩展捕获以捕获标签名称,并且有条件地不替换诸如textarea和script之类的一组标签.

Note you could expand the capture on the lefthand side to capture the tag name and conditionally not replace for a set of tags like textarea and script.

string searchString = "ABC";
string content = "<div><img src='/site/folder/ABC.PNG' /><br />ABC</div>";
string replacePattern = "$1<span style=\"background-color:yellow\">$2</span>$3";
string searchPattern = String.Format("(>[^<>]*?)({0})([^<>]*?<)", searchString.Trim());
content = Regex.Replace(content, searchPattern, replacePattern, RegexOptions.IgnoreCase);
Console.WriteLine(content);

这篇关于在C#中使用正则表达式突出显示html中的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆