Dom和XPath刮 - 这里有什么问题? [英] Dom and XPath scraping - What wrong here?

查看:104
本文介绍了Dom和XPath刮 - 这里有什么问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从互联网上的网页中删除一段文字,我正在使用dom和xpath查找数据,但是我似乎无法选择我需要的确切信息。这是我的代码到目前为止,问题是与项目(0) - > nodeValue部分 - 这适用于我的其他scrapes我有另一个页面,但不是这一个。

  $ argos_html = file_get_html('http://www.argos.co.uk/static/Product/partNumber/9282197/Trail/searchtext%3EIPOD+TOUCH.htm'); 

$ dom_argos = new DOMDocument();
$ dom_argos-> loadHTML($ argos_html);

$ xpath_argos = new DOMXpath($ dom_argos);

$ expr_currys =/ html / body / div [4] / div [3] / form / div [2] / div / div [5] / ul / li [3] / span ;
$ nodes_argos = $ xpath_argos-> query($ expr_argos);

$ argos_stock_data = $ nodes_argos-> item(0) - > nodeValue;

有没有人会告诉我我哪里错了?因为我总是得到一个错误,它涉及到 - > item(0) - > nodeValue;但是,如果我发表评论,theres没有错误,但是没有收集任何数据...



应该是 - > nodeValue;



我明白这可能是页面结构,但我是全新的!
Thx

解决方案

运行代码,我先得到:



注意:未定义的变量:expr_argos
警告:DOMXPath :: query()[domxpath.query]:无效的表达式

所以,首先,确保你使用的XPath查询有效的东西 - 例如你应该这样做:

  $ nodes_argos = $ xpath_argos-> query($ expr_currys); 

而不是您目前拥有的:

  $ nodes_argos = $ xpath_argos-> query($ expr_argos); 



然后,您会收到以下错误:

 注意:试图获取非对象的财产

在以下行:

  $ argos_stock_data = $ nodes_argos-> item(0) - > ;的nodeValue; 

基本上,这意味着您正在尝试读取一个属性 nodeValue ,对于不是对象的东西: $ nodes_argos-> item(0);



我猜你的XPath查询是无效的;所以,调用 xpath()方法不会返回任何有趣的东西。



你应该检查你的(相当长的时间不容易理解) XPath查询,确保它与HTML页面中的某些内容相匹配。


I need to scrape a length of text from a webpage from the internet, I am using the dom and xpath to find the data, however I cant seem to select the exact information I need. Here is my code so far, the problem is with the item(0)->nodeValue section - this works for my other scrapes i have for another page, however not this one.

$argos_html = file_get_html('http://www.argos.co.uk/static/Product/partNumber/9282197/Trail/searchtext%3EIPOD+TOUCH.htm');

$dom_argos= new DOMDocument();
$dom_argos->loadHTML($argos_html);

$xpath_argos = new DOMXpath($dom_argos);

$expr_currys = "/html/body/div[4]/div[3]/form/div[2]/div/div[5]/ul/li[3]/span";
$nodes_argos = $xpath_argos->query($expr_argos);

$argos_stock_data = $nodes_argos->item(0)->nodeValue;

Could anyone show me where I am going wrong ? because I always get an error, which relates to the ->item(0)->nodeValue; part, however if I comment that out, theres no error, but theres no data collected at all...

Should it perhaps be just ->nodeValue;

I understand this may be down to page structures, but I am new to all of this! Thx

解决方案

Running your code, I first get :

Notice: Undefined variable: expr_argos
Warning: DOMXPath::query() [domxpath.query]: Invalid expression

So, first of all, make sure you are using something valid for your XPath query -- for example, you should have this :

$nodes_argos = $xpath_argos->query($expr_currys);

instead of what you currently have :

$nodes_argos = $xpath_argos->query($expr_argos);


Then, you get the following error :

Notice: Trying to get property of non-object

on the following line :

$argos_stock_data = $nodes_argos->item(0)->nodeValue;

Basically, this means you are trying to read a property, nodeValue, on something that is not an object : $nodes_argos->item(0);

I'm guessing your XPath query is not valid ; so, the call to the xpath() method doesn't return anything interesting.

You should check your (quite a bit too long to be easy to understand) XPath query, making sure it matches something in your HTML page.

这篇关于Dom和XPath刮 - 这里有什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆