正则表达式,如何找到里面所有不包含标签IMG的A标签? [英] Regular expression, how to find all A tags which do not contain tag IMG inside it?
问题描述
假设我们有这样的 HTML 代码.我们需要获取所有 <a href=""></a>
标签,其中不包含 img
标签.
<a href="http://domain1.com"><span>这里是链接</span></a><a href="http://domain2.com" title="">你好</a><a href="http://domain3.com" title=""><img src=""/></a><a href="http://domain4" title="">我是图像 是的</a>
我正在使用这个正则表达式来查找所有 a 标签链接:
preg_match_all("!
其余的是我对 html 中锚标记的一般匹配,您可能需要使用替代匹配表达式.
根据您的使用情况,您可能需要省略开始和结束 ^ $ 字符.
有关前瞻/后视的更多信息
http://www.codinghorror.com/blog/2005/10/exclude-matches-with-regular-expressions.html
Let's suppose that we have such HTML code. We need to get all <a href=""></a>
tags which DO NOT contain img
tag inside it.
<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>
<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>
I'm using this regular expression to find all the a tag links:
preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>(.*?)</a>!is", $content, $out);
I can modify it like this:
preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>([^<>]+?)</a>!is", $content, $out);
But how can I tell it to exclude results containing <img
substring inside of <a href=""></a>
?
Dom is the way to go, but for the sake of interest here is the solution:
The easiest way too exclude certain matches in regular expressions is to use a 'negative look-ahead' or a 'negative look-behind'. If the negative expression is found anywhere in the string, the match fails.
Example:
^(?!.+<img.+)<a href=\"?\'?.+\"?\'?>.+</a>$
Matches:
<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>
But does not match:
<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>
The negative look forward is this part of the string:
(?!.+<img.+)
This says don't match any strings that have any chars followed by <img, followed by any chars.
<a href=\"?\'?.+\"?\'?>.+</a>
The rest is my general match for anchor tags in html, you might want to use an alternate match expression.
You may need to omit the start and end ^ $ chars depending on your useage.
More info on look ahead / behind
http://www.codinghorror.com/blog/2005/10/excluding-matches-with-regular-expressions.html
这篇关于正则表达式,如何找到里面所有不包含标签IMG的A标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!