删除HTML标签的正则表达式 [英] Regular expression to remove HTML tags

查看：44 发布时间：2021/12/3 0:05:00 c# .net regex

本文介绍了删除HTML标签的正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用以下正则表达式从字符串中删除 html 标签.除了我留下结束标签之外，它可以工作.如果我尝试删除:<a href="blah">blah</a> 它会留下 <a/>.

我根本不知道正则表达式的语法，并且摸索着解决了这个问题.有正则表达式知识的人可以为我提供一个有效的模式.

这是我的代码:

 string sPattern = @"</?!?(img|a)[^>]*>";正则表达式 rgx = 新正则表达式(sPattern);匹配 m = rgx.Match(sSummary);字符串 sResult = "";如果(m.成功)sResult = rgx.Replace(sSummary, "", 1);

我希望删除第一次出现的

话虽如此，这里有一个解决方案可以解决这个特定问题.不过，这绝不是一个完美的解决方案.

var pattern = @"<(img|a)[^>]*>(?[^<]*)<";var regex = new Regex(pattern);var m = regex.Match(sSummary);如果(米.成功){sResult = m.Groups["content"].Value;

I am using the following Regular Expresion to remove html tags from a string. It works except I leave the closing tag. If I attempt to remove: <a href="blah">blah</a> it leaves the <a/>.



I do not know Regular Expression syntax at all and fumbled through this.  Can someone with RegEx knowledge please provide me with a pattern that will work.

Here is my code:
  string sPattern = @"</?!?(img|a)[^>]*>";
  Regex rgx = new Regex(sPattern);
  Match m = rgx.Match(sSummary);
  string sResult = "";
  if (m.Success)
   sResult = rgx.Replace(sSummary, "", 1);
I am looking to remove the first occurence of the <a> and <img> tags.
 解决方案 
Using a regular expression to parse HTML is fraught with pitfalls.  HTML is not a regular language and hence can't be 100% correctly parsed with a regex.  This is just one of many problems you will run into.  The best approach is to use an HTML / XML parser to do this for you.

Here is a link to a blog post I wrote awhile back which goes into more details about this problem.


http://blogs.msdn.com/b/jaredpar/archive/2008/10/15/regular-expression-limitations.aspx


That being said, here's a solution that should fix this particular problem.  It in no way is a perfect solution though. 
var pattern = @"<(img|a)[^>]*>(?<content>[^<]*)<";
var regex = new Regex(pattern);
var m = regex.Match(sSummary);
if ( m.Success ) { 
  sResult = m.Groups["content"].Value;


                        
这篇关于删除HTML标签的正则表达式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

删除HTML标签的正则表达式 [英] Regular expression to remove HTML tags

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

删除HTML标签的正则表达式 [英] Regular expression to remove HTML tags

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭