无法让 XPATH 与 Html Agility Pack 一起工作 [英] Can't get XPATH working with Html Agility Pack

查看:42
本文介绍了无法让 XPATH 与 Html Agility Pack 一起工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过使用 firebug 获取 XPATH 值来抓取 Wikipedia 上的今日精选文章".

I'm trying to scrape the "Today's featured article" on Wikipedia by getting the XPATH value using firebug.

然后将其粘贴到我的代码中:

And then pasting it into my code:

string result = wc.DownloadString("http://en.wikipedia.org/wiki/Main_Page");

            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            doc.LoadHtml(result);

            var featuredArticle = doc.DocumentNode.SelectSingleNode("/html/body/div[3]/div[3]/div[4]/table[2]/tbody/tr/td/table/tbody/tr[2]/td/div/p");

但是,featuredArticle 总是返回 null.我究竟做错了什么?

However, featuredArticle always returns null. What am I doing wrong?

推荐答案

因为 Firebug 显示的 XPath 就像 Firefox 制作的 Html,所以它可能是也可能不是来自服务器的 Html.此外,来自 Firebug 的 Path 是绝对的,每一个微小的变化都可能破坏它.

Because what Firebug shows the XPath like Firefox made the Html, that may or may not be what the Html from the server is. Also, the Path from Firebug is absolute, and every little change can break it.

更简单的方法是查看 Html,您要查找的 p-Tag 位于 id 为 mp-tfa 的 div 中,因此更容易让 XPath 查找div 并且只得到里面的第一个 p.

And easier way is to just look at the Html, the p-Tag you are looking for is in a div with the id mp-tfa, so it's easier to make the XPath look for the div and the just get the first p inside.

像这样:

var wc = new WebClient();
var doc = new HtmlDocument();
doc.Load(wc.OpenRead("http://en.wikipedia.org/wiki/Main_Page"));
var featuredArticle = doc.DocumentNode.SelectSingleNode("//div[@id='mp-tfa']/p");
Console.WriteLine(featuredArticle.InnerText);

学习如何使用 XPath 的最佳地点是 w3schools.com.

The best place to learn how to use XPath is w3schools.com.

或者你可以使用 Linq,不过我觉得 XPath 更清晰一些.

Or you could use Linq, though i feel XPath is a bit more clear.

var featuredArticle=   doc.DocumentNode.Descendants("div")
 .First(n => n.Id == "mp-tfa")
 .Descendants("p").FirstOrDefault();

这篇关于无法让 XPATH 与 Html Agility Pack 一起工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆