使用PHP的HTML DOMDocument解析HTML [英] Parse HTML with PHP's HTML DOMDocument
问题描述
我试图用getElementsByTagName来实现,但它不起作用,我刚刚使用DOMDocument来解析HTML,因为我曾经使用过正则表达式,直到昨天某种形式的fics告诉我DOMEDocument会是更好的工作,所以我给它一个尝试:)
我谷歌了一段时间寻找一些解释,但没有发现任何有帮助(不所以我想捕捉捕获这个文本1和捕获这个文本2等。
看起来并不难,但我无法弄清楚:($ / b>
<
< div>
< / div>
< div class =main>
< div class =text>
捕获此文本2
< / div>
< / div>
:
- 文字 li>
- 位于
< div>
标记内,并且class =text
- 也就是它本身在
< div>
中,并且class =main
我会说最简单的方法不是使用 DOMDocument :: getElementsByTagName
- 将会返回所有标签有一个特定的名称(虽然你只需要其中的一些)。
相反,我会在文档上使用XPath查询,使用 DOMXpath
例如,像这样的东西应该这样做,将HTML字符串加载到DOM对象,并实例 DOMXpath
类:
$ html =<< ;< HTML
< div class =main>
< div class =text>
捕获此文本1
< / div>
< / div>
< div class =main>
< div class =text>
捕获此文本2
< / div>
< / div>
HTML;
$ dom = new DOMDocument();
$ dom-> loadHTML($ html);
$ xpath = new DOMXPath($ dom);
然后,您可以使用XPath查询, DOMXPath :: query
方法,它返回您正在搜索的元素列表:
$ tags = $ xpath- >查询( '// DIV [@类= 主] /格[@类= 文本]');
foreach($ tags as $ tag){
var_dump(trim($ tag-> nodeValue));
}
执行这个操作会得到以下输出结果:
string'Capture this text 1'(length = 19)
string'Capture this text 2'(length = 19)
I was trying to do it with "getElementsByTagName", but it wasn't working, I'm new to using DOMDocument to parse HTML, as I used to use regex until yesterday some kind fokes here told me that DOMEDocument would be better for the job, so I'm giving it a try :)
I google around for a while looking for some explains but didn't find anything that helped (not with the class anyway)
So I want to capture "Capture this text 1" and "Capture this text 2" and so on.
Doesn't look to hard, but I can't figure it out :(
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
If you want to get :
- The text
- that's inside a
<div>
tag withclass="text"
- that's, itself, inside a
<div>
withclass="main"
I would say the easiest way is not to use DOMDocument::getElementsByTagName
-- which will return all tags that have a specific name (while you only want some of them).
Instead, I would use an XPath query on your document, using the DOMXpath
class.
For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath
class :
$html = <<<HTML
<div class="main">
<div class="text">
Capture this text 1
</div>
</div>
<div class="main">
<div class="text">
Capture this text 2
</div>
</div>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
And, then, you can use XPath queries, with the DOMXPath::query
method, that returns the list of elements you were searching for :
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
var_dump(trim($tag->nodeValue));
}
And executing this gives me the following output :
string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)
这篇关于使用PHP的HTML DOMDocument解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!