使用 PHP 的 HTML DOMDocument 解析 HTML [英] Parse HTML with PHP's HTML DOMDocument

查看:47
本文介绍了使用 PHP 的 HTML DOMDocument 解析 HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用getElementsByTagName"来做到这一点,但它不起作用,我是使用 DOMDocument 解析 HTML 的新手,因为我曾经使用正则表达式直到昨天这里的一些朋友告诉我 DOMEDocument 将是更适合这份工作,所以我要试一试:)

我在谷歌上搜索了一段时间寻找一些解释,但没有找到任何有帮助的东西(无论如何都不是课堂)

所以我想捕获Capture this text 1"和Capture this text 2"等等.

看起来不难,但我想不通:(

<div class="text">捕获此文本 1

<div class="main"><div class="text">捕获此文本 2

解决方案

如果你想得到:

  • 正文
  • 位于带有 class="text"
  • 标签内
  • 这本身就是一个带有 class="main"

我想说最简单的方法是不使用 DOMDocument::getElementsByTagName -- 这将返回所有具有特定名称的标签(而您只需要其中的一些).

相反,我会在您的文档上使用 XPath 查询,使用 DOMXpath 类.


例如,应该这样做,将 HTML 字符串加载到 DOM 对象中,并实例化 DOMXpath 类:

$html = <<<HTML<div class="main"><div class="text">捕获此文本 1

<div class="main"><div class="text">捕获此文本 2

HTML;$dom = 新的 DOMDocument();$dom->loadHTML($html);$xpath = new DOMXPath($dom);


然后,您可以使用 XPath 查询,通过 DOMXPath::query 方法,返回您正在搜索的元素列表:

$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');foreach ($tags as $tag) {var_dump(trim($tag->nodeValue));}


并执行此给我以下输出:

string 'Capture this text 1' (length=19)字符串 'Capture this text 2' (length=19)

I was trying to do it with "getElementsByTagName", but it wasn't working, I'm new to using DOMDocument to parse HTML, as I used to use regex until yesterday some kind fokes here told me that DOMEDocument would be better for the job, so I'm giving it a try :)

I google around for a while looking for some explains but didn't find anything that helped (not with the class anyway)

So I want to capture "Capture this text 1" and "Capture this text 2" and so on.

Doesn't look to hard, but I can't figure it out :(

<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>

解决方案

If you want to get :

I would say the easiest way is not to use DOMDocument::getElementsByTagName -- which will return all tags that have a specific name (while you only want some of them).

Instead, I would use an XPath query on your document, using the DOMXpath class.


For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath class :

$html = <<<HTML
<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);


And, then, you can use XPath queries, with the DOMXPath::query method, that returns the list of elements you were searching for :

$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}


And executing this gives me the following output :

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)

这篇关于使用 PHP 的 HTML DOMDocument 解析 HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
PHP最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆