使用LINQ使用HtmlAgilityPack解析HTML页面 [英] Parsing HTML page with HtmlAgilityPack using LINQ

查看:79
本文介绍了使用LINQ使用HtmlAgilityPack解析HTML页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在网页上使用Linq解析html并将值添加到字符串.我正在Metro应用程序上使用HtmlAgilityPack,想带回3个值并将它们添加到字符串中.

How can i parse html using Linq on a webpage and add values to a string. I am using the HtmlAgilityPack on a metro application and would like to bring back 3 values and add them to a string.

这是网址= http://explorer.litecoin.net/address/Li7x5UZqWUy7o2t1

我想从下面的"belwo"中获取值

I would like to get the values from the following see "belwo"

余额:", 交易", 已收到"

"Balance:", "Transactions in", "Received"

WebResponse x = await req.GetResponseAsync();
HttpWebResponse res = (HttpWebResponse)x;
if (res != null)
{
    if (res.StatusCode == HttpStatusCode.OK)
    {
        Stream stream = res.GetResponseStream();
        using (StreamReader reader = new StreamReader(stream))
        {
            html = reader.ReadToEnd();
        }
        HtmlDocument htmlDocument = new HtmlDocument();
        htmlDocument.LoadHtml(html);

        string appName = htmlDocument.DocumentNode.Descendants // not sure what t
        string a = "Name: " + WebUtility.HtmlDecode(appName);
    }
}

推荐答案

请尝试以下操作.您还可以考虑将表格拉开,因为表格的格式比'p'标记中的自由文本要好一些.

Please try the following. You might also consider pulling the table apart as it is a little better formed than the free-text in the 'p' tag.

干杯,亚伦.

// download the site content and create a new html document
// NOTE: make this asynchronous etc when considering IO performance
var url = "http://explorer.litecoin.net/address/Li7x5UZqWUy7o1tEC2x5o6cNsn2bmDxA2N";
var data = new WebClient().DownloadString(url);
var doc = new HtmlDocument();
doc.LoadHtml(data);

// extract the transactions 'h3' title, the node we want is directly before it
var transTitle = 
    (from h3 in doc.DocumentNode.Descendants("h3")
     where h3.InnerText.ToLower() == "transactions"
     select h3).FirstOrDefault();

// tokenise the summary, one line per 'br' element, split each line by the ':' symbol
var summary = transTitle.PreviousSibling.PreviousSibling;
var tokens = 
    (from row in summary.InnerHtml.Replace("<br>", "|").Split('|')
     where !string.IsNullOrEmpty(row.Trim())
     let line = row.Trim().Split(':')
     where line.Length == 2
     select new { name = line[0].Trim(), value = line[1].Trim() });

// using linqpad to debug, the dump command drops the currect variable to the output
tokens.Dump();

'Dump()'是一个LinqPad命令,它将变量转储到控制台,以下是Dump命令的输出示例:

'Dump()', is a LinqPad command that dumps the variable to the console, the following is a sample of the output from the Dump command:

  • 余额:5 LTC
  • 交易:2
  • 已收到:5个LTC
  • 完成交易:0
  • 已发送:0 LTC

这篇关于使用LINQ使用HtmlAgilityPack解析HTML页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆