HTML敏捷包 - 解析表 [英] HTML Agility pack - parsing tables

查看:146
本文介绍了HTML敏捷包 - 解析表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用的HTML敏捷包解析从复杂的网页表格,但我在对象模型我莫名其妙地丢失了。

I want to use the HTML agility pack to parse tables from complex web pages, but I am somehow lost in the object model.

我看了一下链接的例子,但没有发现任何表中的数据这条路。
我可以使用XPath来获取表?我已经加载的数据,如何让表后我基本丧失。我曾在Perl这样做过,这是一个有点笨拙,但工作。 ( HTML :: TableParser )。

I looked at the link example, but did not find any table data this way. Can I use XPath to get the tables? I am basically lost after having loaded the data as to how to get the tables. I have done this in Perl before and it was a bit clumsy, but worked. (HTML::TableParser).

我也很高兴,如果可以只流下合适的对象为使解析一盏灯。

I am also happy if one can just shed a light on the right object order for the parsing.

推荐答案

如何是这样的:
(使用HTML敏捷性包: HTTP://www.$c$cplex.com/htmlagilitypack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(@"<html><body><p><table id=""foo""><tr><th>hello</th></tr><tr><td>world</td></tr></table></body></html>");
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table")) {
    Console.WriteLine("Found: " + table.Id);
    foreach (HtmlNode row in table.SelectNodes("tr")) {
        Console.WriteLine("row");
        foreach (HtmlNode cell in row.SelectNodes("th|td")) {
            Console.WriteLine("cell: " + cell.InnerText);
        }
    }
}

请注意,你可以做,如果你想prettier使用LINQ到对象:

Note that you can make it prettier with LINQ-to-Objects if you want:

var query = from table in doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
            from row in table.SelectNodes("tr").Cast<HtmlNode>()
            from cell in row.SelectNodes("th|td").Cast<HtmlNode>()
            select new {Table = table.Id, CellText = cell.InnerText};

foreach(var cell in query) {
    Console.WriteLine("{0}: {1}", cell.Table, cell.CellText);
}

这篇关于HTML敏捷包 - 解析表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆