Html Agility Pack空值超出表格 [英] Html Agility Pack Empty Values out of Tables

查看:132
本文介绍了Html Agility Pack空值超出表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力学习一些基本的技巧,并感谢这个网站,我已经能够学到很多新东西,但现在我陷入了这个问题......这是我使用的代码:

I am trying to learn some basic scraping and thanks to this site I have been able to learn a lot of new things, but now I am stuck with this problem...This is the code I am using:

var web = new HtmlWeb();
var doc = web.Load("url");
var nodes = doc.DocumentNode.SelectNodes("//*[@id='hotellist_inner']/div");
StreamWriter output = new StreamWriter("out.txt");

if (nodes != null)
{
    foreach (HtmlNode item in nodes)
    {
        if (item != null && item.Attributes["data-recommended"] != null)
        {
            string line = "";
            var nome = item.SelectSingleNode(".//h3/a").InnerText;
            var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
            var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
            var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
            line = line + nome + "," + rating + "," + price + "," + discount;
            Console.WriteLine(line);
            output.WriteLine(line);
        }
    }
}

前两个项目(名称和评级),但是当谈到价格和折扣时,我会得到空的结果。我已经分析了该页面(这里是链接)与铬刮刀,它可以很容易地使用我已经使用的xpath的结果。我不明白我做错了什么。
任何帮助将不胜感激! :D

It all works fine for the first two items (name and rating), but when it comes to price and discount I get empty results. I have analized the page (here is the link) with chrome scraper and it gets the results easily with the xpath I have used. I don't understand what I am doing wrong. Any help would be appreciated! :D

推荐答案

快速浏览一下您试图抓取的网页后,并非所有项目有价格和折扣信息。您需要正确处理此案例以避免发生异常,例如在获取 InnerText 之前检查 null 。您的代码只需稍作更改即可获得价格和折扣信息:

After a quick look at the web page you're trying to scrape, not all item has price and discount information. You need to handle this case properly to avoid exception, for example by checking for null before getting the InnerText. Your code with this slight change was able to get price and discount information where available :

if (item != null && item.Attributes["data-recommended"] != null)
{
    string line = "";
    var nome = item.SelectSingleNode(".//h3/a").InnerText;
    var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
    var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
    var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
    //set priceString to empty string if price is null, else set it to price.InnerText
    var priceString = price == null ? "" : price.InnerText;
    //do similar step for discountString
    var discountString = discount == null ? "" : discount.InnerText;
    line = line + nome + "," + rating + "," + priceString + "," + discountString;
    Console.WriteLine(line);
    output.WriteLine(line);
}

这篇关于Html Agility Pack空值超出表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆