使用PHP的HTML DOMDocument解析HTML [英] Parse HTML with PHP's HTML DOMDocument

查看:646
本文介绍了使用PHP的HTML DOMDocument解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用getElementsByTagName来实现,但它不起作用,我刚刚使用DOMDocument来解析HTML,因为我曾经使用过正则表达式,直到昨天某种形式的fics告诉我DOMEDocument会是更好的工作,所以我给它一个尝试:)

我谷歌了一段时间寻找一些解释,但没有发现任何有帮助(不所以我想捕捉捕获这个文本1和捕获这个文本2等。



看起来并不难,但我无法弄清楚:($ / b>

 < 
< div>
< / div>

< div class =main>
< div class =text>
捕获此文本2
< / div>
< / div>


解决方案




  • 文字 li>
  • 位于< div> 标记内,并且 class =text

  • 也就是它本身在< div> 中,并且 class =main



我会说最简单的方法不是使用 DOMDocument :: getElementsByTagName - 将会返回所有标签有一个特定的名称​​(虽然你只需要其中的一些)



相反,我会在文档上使用XPath查询,使用 DOMXpath



例如,像这样的东西应该这样做,将HTML字符串加载到DOM对象,并实例 DOMXpath 类:

  $ html =<< ;< HTML 
< div class =main>
< div class =text>
捕获此文本1
< / div>
< / div>

< div class =main>
< div class =text>
捕获此文本2
< / div>
< / div>
HTML;

$ dom = new DOMDocument();
$ dom-> loadHTML($ html);

$ xpath = new DOMXPath($ dom);



然后,您可以使用XPath查询,
DOMXPath :: query 方法,它返回您正在搜索的元素列表:

  $ tags = $ xpath- >查询( '// DIV [@类= 主] /格[@类= 文本]'); 
foreach($ tags as $ tag){
var_dump(trim($ tag-> nodeValue));
}



执行这个操作会得到以下输出结果:

  string'Capture this text 1'(length = 19)
string'Capture this text 2'(length = 19)


I was trying to do it with "getElementsByTagName", but it wasn't working, I'm new to using DOMDocument to parse HTML, as I used to use regex until yesterday some kind fokes here told me that DOMEDocument would be better for the job, so I'm giving it a try :)

I google around for a while looking for some explains but didn't find anything that helped (not with the class anyway)

So I want to capture "Capture this text 1" and "Capture this text 2" and so on.

Doesn't look to hard, but I can't figure it out :(

<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>

解决方案

If you want to get :

  • The text
  • that's inside a <div> tag with class="text"
  • that's, itself, inside a <div> with class="main"

I would say the easiest way is not to use DOMDocument::getElementsByTagName -- which will return all tags that have a specific name (while you only want some of them).

Instead, I would use an XPath query on your document, using the DOMXpath class.


For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath class :

$html = <<<HTML
<div class="main">
    <div class="text">
    Capture this text 1
    </div>
</div>

<div class="main">
    <div class="text">
    Capture this text 2
    </div>
</div>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);


And, then, you can use XPath queries, with the DOMXPath::query method, that returns the list of elements you were searching for :

$tags = $xpath->query('//div[@class="main"]/div[@class="text"]');
foreach ($tags as $tag) {
    var_dump(trim($tag->nodeValue));
}


And executing this gives me the following output :

string 'Capture this text 1' (length=19)
string 'Capture this text 2' (length=19)

这篇关于使用PHP的HTML DOMDocument解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆