HTML Agility Pack-使用XPath获取单个节点-未将Object Reference设置为对象的实例 [英] HTML Agility Pack - using XPath to get a single node - Object Reference not set to an instance of an object

查看:58
本文介绍了HTML Agility Pack-使用XPath获取单个节点-未将Object Reference设置为对象的实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我第一次尝试使用HAP获取元素值.尝试使用InnerText时出现空对象错误.

this is my first attempt to get an element value using HAP. I'm getting a null object error when I try to use InnerText.

我要抓取的网址是:- http://www.mypivots.com/dailynotes/symbol/659/-1/e-mini-sp500-june-2013 我正在尝试从日更改摘要"表中获取当前的高价.

the URL I am scraping is :- http://www.mypivots.com/dailynotes/symbol/659/-1/e-mini-sp500-june-2013 I am trying to get the value for current high from the Day Change Summary Table.

我的代码在底部.首先,我想知道我是否正在以正确的方式进行?如果是这样,那仅仅是我的XPath值不正确吗?

My code is at the bottom. Firstly, I would just like to know if I am going about this the right way? If so, then is it simply that my XPath value is incorrect?

XPath值是使用我发现的名为htmlagility helper的实用程序获得的.下面的XPath的Firebug版本也给出了相同的错误:- /html/body/div [3]/div/table/tbody/tr [3]/td/table/tbody/tr [5]/td [3]

the XPath value was obtained using a utility I found called htmlagility helper. The firebug version of the XPath below, also gives the same error :- /html/body/div[3]/div/table/tbody/tr[3]/td/table/tbody/tr[5]/td[3]

我的代码:-

WebClient myPivotsWC = new WebClient();
string nodeValue;
string htmlCode = myPivotsWC.DownloadString("http://www.mypivots.com/dailynotes/symbol/659/-1/e-mini-sp500-june-2013");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
HtmlNode node = doc.DocumentNode.SelectSingleNode("/html[1]/body[1]/div[3]/div[1]/table[1]/tbody[1]/tr[3]/td[1]/table[1]/tbody[1]/tr[5]/td[3]");
nodeValue=(node.InnerText);

谢谢, 会.

推荐答案

您不能依靠FireBug或Chrome等开发人员工具来确定要访问的节点的XPATH,因为这些工具提供的XPATH对应于内存中的HTML DOM,而Html Agility Pack只知道服务器发回的原始HTML.

You can't rely on a developper tools such as FireBug or Chrome, etc... to determine the XPATH for the nodes you're after, as the XPATH given by such tools correspond to the in memory HTML DOM while the Html Agility Pack only knows about the raw HTML sent back by the server.

您需要做的是直观地查看发回的内容(或只是查看源代码).例如,您将看到没有TBODY元素.因此,您希望找到任何可区别的东西,例如,使用 XPATH轴.另外,即使您的XPATH有效,它也不会对文档中的更改产生很大的抵抗力,因此您需要找到更稳定"的内容,以使抓取更加适应未来.

What you need to do is look visually at what's sent back (or just do a view source). You'll see there is no TBODY element for example. So you want to find anything discriminant, and use XPATH axes for example. Also, your XPATH, even if it worked, would not be very resistant to changes in the document, so you need to find something more "stable" for the scraping to be more future-proof.

这是一个似乎有效的代码:

Here is a code that seems to work:

HtmlNode node = doc.DocumentNode.SelectSingleNode("//td[@class='dnTableCell']//a[text()='High']/../../td[3]");

这是它的作用:

  • 找到一个CLASS属性设置为"dnTableCell"的TD元素. //标记表示搜索在XML层次结构中是递归的.
  • 找到一个A元素,该元素包含等于高"的文本(内部文本).
  • 向上导航两个父母(我们将移至最近的TR元素)
  • 从此处选择第三个TD元素

这篇关于HTML Agility Pack-使用XPath获取单个节点-未将Object Reference设置为对象的实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆