如何借助html敏捷包从html文件中提取最里面的表? [英] How to extract innermost table from html file with the help of the html agility pack?

查看:49
本文介绍了如何借助html敏捷包从html文件中提取最里面的表?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在借助html敏捷包从html文件中解析表格信息.

I am parsing the tabular information from the html file with the help of the html agility pack.

现在我可以做到,并且可以正常工作.

Now I can do it and it works.

但是当我要提取的表最里面时.

But when the table what I want to extract is inner most.

或者我不知道它在嵌套表中的哪个位置.并且可以有任意数量的嵌套表,并且我想从中提取具有列名,名称和地址的表的信息.

Or I don't know at which position it is in nested tables.And there can be any number of nested tables and from that I want to extract the information of the table which has column name name,address.

例如

<table>
    <table>
           <tr><td>PHONE NO.</td><td>OTHER INFO.</td></tr>
           <tr><td>
              <table>
                 <tr><td>AMOUNT</td></tr>
                 <tr><td>50000</td></tr>
                 <tr><td>80000</td></tr>
              </table>
           </td></tr>
           <tr><td>
              <table>
                 <tr><td>
                     <table>
                         <tr><td>
                              <table>
                                 <tr><td> NAME </td><td>ADDRESS</td>
                                 <tr><td> ABC  </td><td> kfks   </td>
                                 <tr><td> BCD  </td><td> fdsa   </td>
                              </table>
                         </tr></td>
                     </table>
                 </td></tr>
              </table>
           </td></tr>
        </table>

有很多表,但是我想提取具有列名name,address的表. 所以我该怎么做 ?

There are many tables but I want to extract the table which has column name name,address. So what should I do ?

推荐答案

var table = doc.DocumentNode.SelectSingleNode("//table [not(descendant::table) and tr[1]/td[normalize-space()='ADDRESS'] ]");

这篇关于如何借助html敏捷包从html文件中提取最里面的表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆