使用LINQ to XML来遍历HTML表格 [英] Using LINQ to XML to traverse an HTML table

查看:159
本文介绍了使用LINQ to XML来遍历HTML表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我可以轻松地使用LINQ to XML来遍历一个正确设置的XML文档。但是我在解决如何将它应用到HTML表格时遇到了一些问题。这里是设置:

 < table class ='inner'
width ='100%'>
< tr>
< th>区域< / th>
< th>日期< / th>
< th> ID< / th>
< th>名称< / th>
< th>电子邮件< / th>
< th>邮编< / th>
< th>类型< / th>
< th>金额< / th>
< / tr>
< tr>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< / tr>
< tr>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< td>资料< / td>
< / tr>
< / table>

实际上,可以有无数的行,我希望能够逐行-row以相应地检查数据。任何人都可以将我指向正确的方向吗?我应该使用除LINQ以外的其他工具吗?



编辑:对于混淆,抱歉,我的问题是,试图从HTML中收集数据,而不是XML。确切的扩展名是.aspx.htm。这似乎没有正确加载,即使它不确定如何遍历HTML页面,因为有一张表在我尝试从中获取数据。



例如,以下是我试图从中获取信息的表中的XPATH:

  / html / body / form / div [3] / table / tbody / tr [5] / td / table 


解决方案

  XElement myTable = xdoc.Descendants(table)。FirstOrDefault(xelem => xelem。属性(class)。Value ==inner); 
IEnumerable< IEnumerable< XElement>> myRows = myTable.Elements()。Select(xelem => xelem.Elements());

foreach(IEnumerable< XElement> myRows中的tableRow)
{
foreach(XElement rowCell in tableRow)
{
// tada ..



So, I can easily use LINQ to XML to traverse a properly set-up XML document. But I'm having some issues figuring out how to apply it to an HTML table. Here is the setup:

<table class='inner'
       width='100%'>
    <tr>
        <th>Area</th>
        <th>Date</th>
        <th>ID</th>
        <th>Name</th>
        <th>Email</th>
        <th>Zip Code</th>
        <th>Type</th>
        <th>Amount</th>
    </tr>
    <tr>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
    </tr>
    <tr>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
        <td>Data</td>
    </tr>
</table>

Essentially, there can be an endless number of rows, I want to be able to go row-by-row to check the data accordingly. Can anyone point me in the right direction? Should I be using tools other than LINQ for this?

EDIT: Sorry about the confusion, my issue is the fact that the page I am trying to gather data from is HTML, not XML. The exact extension is ".aspx.htm". This doesnt seem to load properly, and even if it did I'm not certain how to traverse the HTML page, given that there is one table before the table I'm trying to get data from.

For example, here is the XPATH to the table I'm trying to get info from:

/html/body/form/div[3]/table/tbody/tr[5]/td/table

解决方案

XElement myTable = xdoc.Descendants("table").FirstOrDefault(xelem => xelem.Attribute("class").Value == "inner");
IEnumerable<IEnumerable<XElement>> myRows = myTable.Elements().Select(xelem => xelem.Elements());

foreach(IEnumerable<XElement> tableRow in myRows)
{
    foreach(XElement rowCell in tableRow)
    {
        // tada..
    }
}

这篇关于使用LINQ to XML来遍历HTML表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆