查找字符串中的HTML标签 [英] Finding HTML tags in string

查看:267
本文介绍了查找字符串中的HTML标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这个问题是关于SO的,但是我找不到正确的问题,但是我仍然很讨厌Regex:/

I know this question is around SO, but I can't find the right one and I still suck in Regex :/

我有一个string,并且该字符串是有效的HTML.现在,我想查找具有特定nameattribute的所有标签.

I have an string and that string is valid HTML. Now I want to find all the tags with an certain name and attribute.

我尝试了此正则表达式(即类型为div的):/(<div type="my_special_type" src="(.*?)<\/div>)/.

I tried this regex (i.e. div with type): /(<div type="my_special_type" src="(.*?)<\/div>)/.

示例字符串:

<div>Do not match me</div>
<div type="special_type" src="bla"> match me</div>
<a>not me</a>
<div src="blaw" type="special_type" > match me too</div>

如果我使用preg_match,那么我只会得到<div type="special_type" src="bla"> match me</div>的逻辑,因为另一个具有不同顺序的属性.

If I use preg_match then I only get <div type="special_type" src="bla"> match me</div> what is logical because the other one has the attributes in a different order.

在示例字符串上使用preg_match时,我需要什么正则表达式获取以下array?

What regex do I need to get the following array when using preg_match on the example string?:

array(0 => '<div type="special_type" src="bla"> match me</div>',
      1 => '<div src="blaw" type="special_type" > match me too</div>')

推荐答案

一般建议:不要使用正则表达式来解析HTML .如果HTML发生更改,它将变得混乱.

A general advice: Dont use regex to parse HTML It will get messy if the HTML changes..

改为使用DOMDocument:

$str = <<<EOF
<div>Do not match me</div>
<div type="special_type" src="bla"> match me</div>
<a>not me</a>
<div src="blaw" type="special_type" > match me too</div>
EOF;

$doc = new DOMDocument();
$doc->loadHTML($str);    
$selector = new DOMXPath($doc);

$result = $selector->query('//div[@type="special_type"]');

// loop through all found items
foreach($result as $node) {
    echo $node->getAttribute('src');
}

这篇关于查找字符串中的HTML标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆