正则表达式,如何找到里面所有不包含标签IMG的A标签? [英] Regular expression, how to find all A tags which do not contain tag IMG inside it?

查看:50
本文介绍了正则表达式,如何找到里面所有不包含标签IMG的A标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有这样的 HTML 代码.我们需要获取所有 <a href=""></a> 标签,其中不包含 img 标签.

<a href="http://domain1.com"><span>这里是链接</span></a><a href="http://domain2.com" title="">你好</a><a href="http://domain3.com" title=""><img src=""/></a><a href="http://domain4" title="">我是图像 是的</a>

我正在使用这个正则表达式来查找所有 a 标签链接:

preg_match_all("!

其余的是我对 html 中锚标记的一般匹配,您可能需要使用替代匹配表达式.

根据您的使用情况,您可能需要省略开始和结束 ^ $ 字符.

有关前瞻/后视的更多信息

http://www.codinghorror.com/blog/2005/10/exclude-matches-with-regular-expressions.html

Let's suppose that we have such HTML code. We need to get all <a href=""></a> tags which DO NOT contain img tag inside it.

<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>
<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>

I'm using this regular expression to find all the a tag links:

preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>(.*?)</a>!is", $content, $out);

I can modify it like this:

preg_match_all("!<a[^>]+href=\"?'?([^ \"'>]+)\"?'?[^>]*>([^<>]+?)</a>!is", $content, $out);

But how can I tell it to exclude results containing <img substring inside of <a href=""></a>?

解决方案

Dom is the way to go, but for the sake of interest here is the solution:

The easiest way too exclude certain matches in regular expressions is to use a 'negative look-ahead' or a 'negative look-behind'. If the negative expression is found anywhere in the string, the match fails.

Example:

^(?!.+<img.+)<a href=\"?\'?.+\"?\'?>.+</a>$

Matches:

<a href="http://domain1.com"><span>Here is link</span></a>
<a href="http://domain2.com" title="">Hello</a>

But does not match:

<a href="http://domain3.com" title=""><img src="" /></a>
<a href="http://domain4" title=""> I'm the image <img src="" /> yeah</a>

The negative look forward is this part of the string:

(?!.+<img.+)

This says don't match any strings that have any chars followed by <img, followed by any chars.

<a href=\"?\'?.+\"?\'?>.+</a>

The rest is my general match for anchor tags in html, you might want to use an alternate match expression.

You may need to omit the start and end ^ $ chars depending on your useage.

More info on look ahead / behind

http://www.codinghorror.com/blog/2005/10/excluding-matches-with-regular-expressions.html

这篇关于正则表达式,如何找到里面所有不包含标签IMG的A标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆