使用PHP的HTML DOMDocument解析HTML [英] Parse HTML with PHP's HTML DOMDocument

查看：646 发布时间：2018/6/13 10:08:51 php html parsing domdocument

本文介绍了使用PHP的HTML DOMDocument解析HTML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图用getElementsByTagName来实现，但它不起作用，我刚刚使用DOMDocument来解析HTML，因为我曾经使用过正则表达式，直到昨天某种形式的fics告诉我DOMEDocument会是更好的工作，所以我给它一个尝试：）

我谷歌了一段时间寻找一些解释，但没有发现任何有帮助（不所以我想捕捉捕获这个文本1和捕获这个文本2等。

看起来并不难，但我无法弄清楚：（$ / b>

 < 
< div> 
< / div> 
 
< div class =main> 
< div class =text> 
捕获此文本2 
< / div> 
< / div>

解决方案

：

文字 li>
位于< div> 标记内，并且 class =text

也就是它本身在< div> 中，并且 class =main

我会说最简单的方法不是使用 DOMDocument :: getElementsByTagName - 将会返回所有标签有一个特定的名称（虽然你只需要其中的一些）。

相反，我会在文档上使用XPath查询，使用 DOMXpath

例如，像这样的东西应该这样做，将HTML字符串加载到DOM对象，并实例 DOMXpath 类：

  $ html =<< ;< HTML 
< div class =main> 
< div class =text> 
捕获此文本1 
< / div> 
< / div> 
 
< div class =main> 
< div class =text> 
捕获此文本2 
< / div> 
< / div> 
 HTML; 
 
 $ dom = new DOMDocument（）; 
 $ dom-> loadHTML（$ html）; 
 
 $ xpath = new DOMXPath（$ dom）;

然后，您可以使用XPath查询， DOMXPath :: query 方法，它返回您正在搜索的元素列表：

  $ tags = $ xpath- >查询（ '// DIV [@类= 主] /格[@类= 文本]'）; 
 foreach（$ tags as $ tag）{
 var_dump（trim（$ tag-> nodeValue））; 
}

执行这个操作会得到以下输出结果：

  string'Capture this text 1'（length = 19）
 string'Capture this text 2'（length = 19）

I was trying to do it with "getElementsByTagName", but it wasn't working, I'm new to using DOMDocument to parse HTML, as I used to use regex until yesterday some kind fokes here told me that DOMEDocument would be better for the job, so I'm giving it a try :)

I google around for a while looking for some explains but didn't find anything that helped (not with the class anyway)

So I want to capture "Capture this text 1" and "Capture this text 2" and so on.

Doesn't look to hard, but I can't figure it out :(
<div class="main"> <div class="text"> Capture this text 1 </div> </div> <div class="main"> <div class="text"> Capture this text 2 </div> </div>

解决方案
If you want to get :

The text

that's inside a <div> tag with class="text"

that's, itself, inside a <div> with class="main"

I would say the easiest way is not to use DOMDocument::getElementsByTagName -- which will return all tags that have a specific name (while you only want some of them).

Instead, I would use an XPath query on your document, using the DOMXpath class.

For example, something like this should do, to load the HTML string into a DOM object, and instance the DOMXpath class :
$html = <<<HTML <div class="main"> <div class="text"> Capture this text 1 </div> </div> <div class="main"> <div class="text"> Capture this text 2 </div> </div> HTML; $dom = new DOMDocument(); $dom->loadHTML($html); $xpath = new DOMXPath($dom);

And, then, you can use XPath queries, with the DOMXPath::query method, that returns the list of elements you were searching for :
$tags = $xpath->query('//div[@class="main"]/div[@class="text"]'); foreach ($tags as $tag) { var_dump(trim($tag->nodeValue)); }

And executing this gives me the following output :
string 'Capture this text 1' (length=19) string 'Capture this text 2' (length=19)

这篇关于使用PHP的HTML DOMDocument解析HTML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用PHP的HTML DOMDocument解析HTML [英] Parse HTML with PHP's HTML DOMDocument

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

使用PHP的HTML DOMDocument解析HTML [英] Parse HTML with PHP&#39;s HTML DOMDocument

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

使用PHP的HTML DOMDocument解析HTML [英] Parse HTML with PHP's HTML DOMDocument

登录关闭