去除所有的HTML标签,除了链接 [英] Strip all HTML tags except links

查看:159
本文介绍了去除所有的HTML标签,除了链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图写一个正EX pression剥离所有HTML除链接(在&LT; A HREF &LT; / A&GT; 分别标记它不必是100%安全(我不担心注入式攻击或任何东西,因为我解析已被批准并公布到<内容HREF =htt​​p://en.wikipedia.org/wiki/SWF> SWF 电影)。

原来带标签常规EX pression我使用的是&LT; | + GT(\名词); ,我试图修改为≤([^ A] | \ n)的+&GT; ,但当然会允许有一个 在它,而不是一个有它在开始时,用一个空格。

不,应该真正的问题,但如果有人在乎知道我写这的ActionScript 3.0 的一个闪存的影片。

解决方案

 ≤(\ / A(=&GT; |?!?\ S *&GT;)。)\ /?.*?>
 

试试这个。也有类似的p标签的东西。为他们的工作所以不明白为什么不能。使用负向前查找,以检查它不匹配(prefixed具有可选/字符),其中(使用正向前查找)一个(使用可选/ preFIX),之后是>或空格,东西,然后>。这就匹配,直到下一个>字符。将这个在SUBST与

  S /&LT;?。?(\ / A(=&GT; |?!?\ S *&GT;))\ / *&GT; //克;
 

这应该只留下打开和关闭标签

I am trying to write a regular expression to strip all HTML with the exception of links (the <a href and </a> tags respectively. It does not have to be 100% secure (I am not worried about injection attacks or anything as I am parsing content that has already been approved and published into a SWF movie).

The original "strip tags" regular expression I'm using was <(.|\n)+?>, and I tried to modify it to <([^a]|\n)+?>, but that of course will allow any tag that has an a in it rather than one that has it in the beginning, with a space.

Not that it should really matter, but in case anyone cares to know I am writing this in ActionScript 3.0 for a Flash movie.

解决方案

<(?!\/?a(?=>|\s.*>))\/?.*?>

Try this. Had something similar for p tags. Worked for them so don't see why not. Uses negative lookahead to check that it doesn't match a (prefixed with an optional / character) where (using positive lookahead) a (with optional / prefix) is followed by a > or a space, stuff and then >. This then matches up until the next > character. Put this in a subst with

s/<(?!\/?a(?=>|\s.*>))\/?.*?>//g;

This should leave only the opening and closing a tags

这篇关于去除所有的HTML标签,除了链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆