删除HTML标签的正则表达式 [英] Regular expression to remove HTML tags

查看:44
本文介绍了删除HTML标签的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下正则表达式从字符串中删除 html 标签.除了我留下结束标签之外,它可以工作.如果我尝试删除:<a href="blah">blah</a> 它会留下 <a/>.

我根本不知道正则表达式的语法,并且摸索着解决了这个问题.有正则表达式知识的人可以为我提供一个有效的模式.

这是我的代码:

 string sPattern = @"</?!?(img|a)[^>]*>";正则表达式 rgx = 新正则表达式(sPattern);匹配 m = rgx.Match(sSummary);字符串 sResult = "";如果(m.成功)sResult = rgx.Replace(sSummary, "", 1);

我希望删除第一次出现的

话虽如此,这里有一个解决方案可以解决这个特定问题.不过,这绝不是一个完美的解决方案.

var pattern = @"<(img|a)[^>]*>(?[^<]*)<";var regex = new Regex(pattern);var m = regex.Match(sSummary);如果(米.成功){sResult = m.Groups["content"].Value;

I am using the following Regular Expresion to remove html tags from a string. It works except I leave the closing tag. If I attempt to remove: <a href="blah">blah</a> it leaves the <a/>.

I do not know Regular Expression syntax at all and fumbled through this. Can someone with RegEx knowledge please provide me with a pattern that will work.

Here is my code:

  string sPattern = @"</?!?(img|a)[^>]*>";
  Regex rgx = new Regex(sPattern);
  Match m = rgx.Match(sSummary);
  string sResult = "";
  if (m.Success)
   sResult = rgx.Replace(sSummary, "", 1);

I am looking to remove the first occurence of the <a> and <img> tags.

解决方案

Using a regular expression to parse HTML is fraught with pitfalls. HTML is not a regular language and hence can't be 100% correctly parsed with a regex. This is just one of many problems you will run into. The best approach is to use an HTML / XML parser to do this for you.

Here is a link to a blog post I wrote awhile back which goes into more details about this problem.

That being said, here's a solution that should fix this particular problem. It in no way is a perfect solution though.

var pattern = @"<(img|a)[^>]*>(?<content>[^<]*)<";
var regex = new Regex(pattern);
var m = regex.Match(sSummary);
if ( m.Success ) { 
  sResult = m.Groups["content"].Value;

这篇关于删除HTML标签的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆