使用HTML Agility Pack和Linq解析html [英] Parsing html with the HTML Agility Pack and Linq

查看:86
本文介绍了使用HTML Agility Pack和Linq解析html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下HTML

(..)
<tbody>
 <tr>
  <td class="name"> Test1 </td>
  <td class="data"> Data </td>
  <td class="data2"> Data 2 </td>
 </tr>
 <tr>
  <td class="name"> Test2 </td>
  <td class="data"> Data2 </td>
  <td class="data2"> Data 2 </td>
 </tr>
</tbody>
(..)

我所拥有的信息是名称=>,所以"Test1"& "Test2".我想知道的是如何根据已有的名称获取"data"和"data2"中的数据.

The information I have is the name => so "Test1" & "Test2". What I want to know is how can I get the data that's in "data" and "data2" based on the Name I have.

当前我正在使用:

var data =
    from
        tr in doc.DocumentNode.Descendants("tr")
    from   
        td in tr.ChildNodes.Where(x => x.Attributes["class"].Value == "name")
    where
        td.InnerText == "Test1"
    select tr;

但是当我尝试查看data

推荐答案

对于您的尝试,您的代码有两个问题:

As for your attempt, you have two issues with your code:

  1. ChildNodes很奇怪-它还会返回没有class属性(当然也没有属性)的空白文本节点.
  2. 正如詹姆斯·沃尔福德(James Walford)所评论的那样,文本周围的空格很大,您可能希望对其进行修剪.
  1. ChildNodes is weird - it also returns whitespace text nodes, which don't have a class attributes (can't have attributes, of course).
  2. As James Walford commented, the spaces around the text are significant, you probably want to trim them.

通过这两个更正,可以完成以下工作:

With these two corrections, the following works:

var data =
      from tr in doc.DocumentNode.Descendants("tr")
      from td in tr.Descendants("td").Where(x => x.Attributes["class"].Value == "name")
     where td.InnerText.Trim() == "Test1"
    select tr;

这篇关于使用HTML Agility Pack和Linq解析html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆