正则表达式查找html div类的内容和数据属性? (preg_match_all) [英] Regex to find html div class content and data-attr? (preg_match_all)

查看:222
本文介绍了正则表达式查找html div类的内容和数据属性? (preg_match_all)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用preg_match_all我想获取html中的类和数据属性.

下面的示例有效,,它仅返回名称 data-id 内容. /p>

我希望示例模式可以同时找到类和数据ID内容.

我应该使用哪个正则表达式模式?

HTML内容:

<!-- I want to: $matches[1] == test_class  | $matches[2] == null -->
<div class="test_class"> 

<!-- I want to: $matches[1] == test_class | $matches[2] == 1 -->
<div class="test_class" data-id="1"> 

<!-- I want to: $matches[1] == test_class | $matches[2] == 1 -->
<div id="test_id" class="test_class" data-id="1">

<!-- I want to: $matches[1] == test_class test_class2 | $matches[2] == 1 -->
<div class="test_class test_class2" id="test_id" data-id="1">

<!-- I want to: $matches[1] == 1 | $matches[2] == test_class test_class2 -->
<div data-id="1" class="test_class test_class2" id="test_id" >

<!-- I want to: $matches[1] == 1 | $matches[2] == test_class test_class2 -->
<div id="test_id" data-id="1" class="test_class test_class2">

<!-- I want to: $matches[1] == test_class | $matches[2] == 1 -->
<div class="test_class" id="test_id" data-id="1">

无法正常运行的正则表达式:

$pattern = '/<(div|i)\s.*(class|data-id)="([^"]+)"[^>]*>/i';

preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

谢谢.

解决方案

为什么不使用DOM解析器呢?

您可以使用//div[@class or @data-id]之类的XPath表达式来定位元素,然后提取其属性值

$doc = new DOMDocument();
$doc->loadHTML($html);

$xpath = new DOMXpath($doc);
$divs = $xpath->query('//div[@class or @data-id]');
foreach ($divs as $div) {
  $matches = [$div->getAttribute('class'), $div->getAttribute('data-id')];
  print_r($matches);
}

演示〜 https://eval.in/1046227

With preg_match_all I want to get class and data-attributes in html.

The example below works, but it only returns class names or only data-id content.

I want the example pattern to find both class and data-id content.

Which regex pattern should I use?

Html contents:

<!-- I want to: $matches[1] == test_class  | $matches[2] == null -->
<div class="test_class"> 

<!-- I want to: $matches[1] == test_class | $matches[2] == 1 -->
<div class="test_class" data-id="1"> 

<!-- I want to: $matches[1] == test_class | $matches[2] == 1 -->
<div id="test_id" class="test_class" data-id="1">

<!-- I want to: $matches[1] == test_class test_class2 | $matches[2] == 1 -->
<div class="test_class test_class2" id="test_id" data-id="1">

<!-- I want to: $matches[1] == 1 | $matches[2] == test_class test_class2 -->
<div data-id="1" class="test_class test_class2" id="test_id" >

<!-- I want to: $matches[1] == 1 | $matches[2] == test_class test_class2 -->
<div id="test_id" data-id="1" class="test_class test_class2">

<!-- I want to: $matches[1] == test_class | $matches[2] == 1 -->
<div class="test_class" id="test_id" data-id="1">

The regex that does not work as I want:

$pattern = '/<(div|i)\s.*(class|data-id)="([^"]+)"[^>]*>/i';

preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

Thanks in advance.

解决方案

Why not use a DOM parser instead?

You could use an XPath expression like //div[@class or @data-id] to locate the elements then extract their attribute values

$doc = new DOMDocument();
$doc->loadHTML($html);

$xpath = new DOMXpath($doc);
$divs = $xpath->query('//div[@class or @data-id]');
foreach ($divs as $div) {
  $matches = [$div->getAttribute('class'), $div->getAttribute('data-id')];
  print_r($matches);
}

Demo ~ https://eval.in/1046227

这篇关于正则表达式查找html div类的内容和数据属性? (preg_match_all)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆