在PHP中使用Xpath解析HTML [英] Using Xpath with PHP to parse HTML

查看:219
本文介绍了在PHP中使用Xpath解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试解析论坛中的一些数据.这是代码:

I'm currently trying to parse some data from a forum. Here is the code:

$xml = simplexml_load_file('https://forums.eveonline.com');

$names = $xml->xpath("html/body/div/div/form/div/div/div/div/div[*]/div/div/table//tr/td[@class='topicViews']");
foreach($names as $name) 
{
    echo $name . "<br/>";
}

无论如何,问题是我正在使用google xpath扩展名来帮助我获取路径,并且我猜测google正在更改html以至于当我使用我的网站进行此搜索时不会出现该html .有什么类型的方法可以使主机通过google chrome浏览网站,从而获得正确的代码?您会提出什么建议?

Anyway, the problem is that I'm using google xpath extension to help me get the path, and I'm guessing that google is changing the html enough to make it not come up when i use my website to do this search. Is there some type of way I can make the host look at the site through google chrome so that it gets the right code? What would you suggest?

谢谢!

推荐答案

我的建议是始终使用 DOMDocument 而不是SimpleXML ,因为它是一个更好用的界面,使任务更加直观.

My suggestion is to always use DOMDocument as opposed to SimpleXML, since it's a much nicer interface to work with and makes tasks a lot more intuitive.

下面的示例向您展示如何将HTML加载到DOMDocument对象中,以及如何使用XPath查询DOM.您真正需要做的就是找到所有具有 topicViews 类名称的 td 元素,这将输出在 nodeValue 中找到的每个 nodeValue 成员.此XPath查询返回的 DOMNodeList .

The following example shows you how to load the HTML into the DOMDocument object and query the DOM using XPath. All you really need to do is find all td elements with a class name of topicViews and this will output each of the nodeValue members found in the DOMNodeList returned by this XPath query.

/* Use internal libxml errors -- turn on in production, off for debugging */
libxml_use_internal_errors(true);
/* Createa a new DomDocument object */
$dom = new DomDocument;
/* Load the HTML */
$dom->loadHTMLFile("https://forums.eveonline.com");
/* Create a new XPath object */
$xpath = new DomXPath($dom);
/* Query all <td> nodes containing specified class name */
$nodes = $xpath->query("//td[@class='topicViews']");
/* Set HTTP response header to plain text for debugging output */
header("Content-type: text/plain");
/* Traverse the DOMNodeList object to output each DomNode's nodeValue */
foreach ($nodes as $i => $node) {
    echo "Node($i): ", $node->nodeValue, "\n";
}

这篇关于在PHP中使用Xpath解析HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆