html敏捷包解析表 [英] html agility pack parse table

查看:58
本文介绍了html敏捷包解析表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的表:

<table border="0" cellpadding="0" cellspacing="0" id="table2">
    <tr>
        <th>Name
        </th>
        <th>Age
        </th>
    </tr>
        <tr>
        <td>Mario
        </td>
        <th>Age: 78
        </td>
    </tr>
            <tr>
        <td>Jane
        </td>
        <td>Age: 67
        </td>
    </tr>
            <tr>
        <td>James
        </td>
        <th>Age: 92
        </td>
    </tr>
</table>

,并且我正在使用html敏捷包对其进行解析.我已经尝试过此代码,但未返回预期结果:这是代码:

and I am using html agility pack to parse it. I have tried this code but it is not returning expected results: Here is the code:

foreach (HtmlNode tr in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr"))
            {
                //looping on each row, get col1 and col2 of each row
                HtmlNodeCollection tds = tr.SelectNodes("td");
                for (int i = 0; i < tds.Count; i++)
                {
                    Response.Write(tds[i].InnerText);
                }
            }

我正在获取每一列,因为我想对返回的内容进行一些处理.

I am getting each column because I would like to do some processing on the contents returned.

我在做什么错了?

推荐答案

您可以从外部foreach循环中获取单元格内容:

You can grab the cell content from within your outer foreach loop:

foreach (HtmlNode td in doc.DocumentNode.SelectNodes("//table[@id='table2']//tr//td"))  
{  
    Response.Write(td.InnerText);   
}  

我也建议修剪并取消对内部文本的验证,以确保其干净:

Also I'd recommend trimming and 'de-entitizing the inner text to ensure it is clean:

Response.Write(HtmlEntity.DeEntitize(td.InnerText).Trim())

在您的来源中,[Age:78]和[Age:92]的单元格的开头都带有<th>标记,而不是<td>

In your source the cells for [Age: 78] and [Age: 92] have a <th> tag at the start instead of <td>

这篇关于html敏捷包解析表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆