HTML敏捷性包 [英] HTML Agility Pack

查看：126 发布时间：2015/11/24 22:33:16 c# .net winforms html-parsing html-agility-pack

本文介绍了HTML敏捷性包的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想用HTML敏捷包来解析HTML表格。我想从表中只提取pdefined一些$ P $列中的数据。

I want to parse the html table using html agility pack. I want to extract only some predefined column data from the table.

不过，我是新来解析和HTML敏捷性包，我都试过，但我不知道如何使用HTML敏捷性包我的需要。

But I am new to parsing and html agility pack and I have tried but I don't know how to use the html agility pack for my need.

如果有人知道然后给我举例来说，如果可能的

If anybody knows then give me example if possible

编辑：

是否可以解析HTML表格一样，如果我们只想提取的决定列名的数据？像有4列的姓名，地址，PHNO我想只提取名称和地址数据。

Is it possible to parse html table like if we want to extract the decided column names' data only ? Like there are 4 columns name,address,phno and I want to extract only name and address data.

推荐答案

还有就是，在论坛的此处。向下滚动了一下，看看表的答案。我也希望他们能提供更好的样本更容易被发现。

There is an example of that in the discussion forums here. Scroll down a bit to see the table answer. I do wish they would provide better samples that were easier to find.

编辑：要提取特定的列中的数据，你就必须先找到＆LT;第i个标签对应于您想要的列并记住它们的索引。那么你就需要找到＆LT; TD＆GT; 标记相同的索引。假设你知道列的索引，你可以做这样的事情：

To extract data from specific columns you would have to first find the <th> tags that correspond to the columns you want and remember their indexes. You would then need to find the <td> tags for the same indexes. Assuming you know the indexes of the columns you could do something like this:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("http://somewhere.com");
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table");
foreach (var row in table.SelectNodes("//tr"))
{
    HtmlNode addressNode = row.SelectSingleNode("td[2]");
    //do something with address here
    HtmlNode phoneNode = row.SelectSingleNode("td[5]");
    // do something with phone here
}

EDIT2：如果您不知道该列的索引，你可以做整个事情是这样的。我没有测试过这一点。

If you don't know the indexes of the columns you could do the whole thing like this. I have not tested this.

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("http://somewhere.com");
var tables = doc.DocumentNode.SelectNodes("//table");

foreach(var table in tables)
{
    int addressIndex = -1;
    int phoneIndex = -1;
    var headers = table.SelectNodes("//th");
    for (int headerIndex = 0; headerIndex < headers.Count(); headerIndex++)
    {
        if (headers[headerIndex].InnerText == "address")
        {
            addressIndex = headerIndex;
        }
        else if (headers[headerIndex].InnerText == "phone")
        {
            phoneIndex = headerIndex;
        }
    }

    if (addressIndex != -1 && phoneIndex != -1)
    {
        foreach (var row in table.SelectNodes("//tr"))
        {
            HtmlNode addressNode = row.SelectSingleNode("td[addressIndex]");
            //do something with address here
            HtmlNode phoneNode = row.SelectSingleNode("td[phoneIndex]");
            // do something with phone here
        }
    }
}

这篇关于HTML敏捷性包的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

HTML敏捷性包 [英] HTML Agility Pack

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

HTML敏捷性包 [英] HTML Agility Pack

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭