- 首页
- PHP
- 使用 PHP 的 HTML DOMDocument 解析 HTML
使用 PHP 的 HTML DOMDocument 解析 HTML
[英] Parse HTML with PHP's HTML DOMDocument
本文介绍了使用 PHP 的 HTML DOMDocument 解析 HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我试图用getElementsByTagName"来做到这一点,但它不起作用,我是使用 DOMDocument 解析 HTML 的新手,因为我曾经使用正则表达式直到昨天这里的一些朋友告诉我 DOMEDocument 将是更适合这份工作,所以我要试一试:)
我在谷歌上搜索了一段时间寻找一些解释,但没有找到任何有帮助的东西(无论如何都不是课堂)
所以我想捕获Capture this text 1"和Capture this text 2"等等.
看起来不难,但我想不通:(
<div class="text">捕获此文本 1
<div class="main"><div class="text">捕获此文本 2
解决方案
如果你想得到:
HTML;$dom = 新的 DOMDocument();$dom->loadHTML($html);$xpath = new DOMXPath($dom);
然后,您可以使用 XPath 查询,通过 DOMXPath::query
方法,返回您正在搜索的元素列表:
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');foreach ($tags as $tag) {var_dump(trim($tag->nodeValue));}
并执行此给我以下输出:
string 'Capture this text 1' (length=19)字符串 'Capture this text 2' (length=19)
I was trying to do it with "getElementsByTagName", but it wasn't working, I'm new to using DOMDocument to parse HTML, as I used to use regex until yesterday some kind fokes here told me that DOMEDocument would be better for the job, so I'm giving it a try :)
I google around for a while looking for some explains but didn't find anything that helped (not with the class anyway)
So I want to capture "Capture this text 1" and "Capture this text 2" and so on.
Doesn't look to hard, but I can't figure it out :(
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
解决方案
If you want to get :
- The text
- that's inside a
<div>
tag with class="text"
- that's, itself, inside a
<div>
with class="main"
I would say the easiest way is not to use DOMDocument::getElementsByTagName
-- which will return all tags that have a specific name (while you only want some of them).
Instead, I would use an XPath query on your document, using the DOMXpath
class.
For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath
class :
$html = <<<HTML
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
And, then, you can use XPath queries, with the DOMXPath::query
method, that returns the list of elements you were searching for :
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
var_dump(trim($tag->nodeValue));
}
And executing this gives me the following output :
string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)
这篇关于使用 PHP 的 HTML DOMDocument 解析 HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!