使用HTML Agility Pack和Linq解析html [英] Parsing html with the HTML Agility Pack and Linq
本文介绍了使用HTML Agility Pack和Linq解析html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下HTML
(..)
<tbody>
<tr>
<td class="name"> Test1 </td>
<td class="data"> Data </td>
<td class="data2"> Data 2 </td>
</tr>
<tr>
<td class="name"> Test2 </td>
<td class="data"> Data2 </td>
<td class="data2"> Data 2 </td>
</tr>
</tbody>
(..)
我所拥有的信息是名称=>,所以"Test1"& "Test2".我想知道的是如何根据已有的名称获取"data"和"data2"中的数据.
The information I have is the name => so "Test1" & "Test2". What I want to know is how can I get the data that's in "data" and "data2" based on the Name I have.
当前我正在使用:
var data =
from
tr in doc.DocumentNode.Descendants("tr")
from
td in tr.ChildNodes.Where(x => x.Attributes["class"].Value == "name")
where
td.InnerText == "Test1"
select tr;
但是当我尝试查看data
推荐答案
对于您的尝试,您的代码有两个问题:
As for your attempt, you have two issues with your code:
-
ChildNodes
很奇怪-它还会返回没有class
属性(当然也没有属性)的空白文本节点. - 正如詹姆斯·沃尔福德(James Walford)所评论的那样,文本周围的空格很大,您可能希望对其进行修剪.
ChildNodes
is weird - it also returns whitespace text nodes, which don't have aclass
attributes (can't have attributes, of course).- As James Walford commented, the spaces around the text are significant, you probably want to trim them.
通过这两个更正,可以完成以下工作:
With these two corrections, the following works:
var data =
from tr in doc.DocumentNode.Descendants("tr")
from td in tr.Descendants("td").Where(x => x.Attributes["class"].Value == "name")
where td.InnerText.Trim() == "Test1"
select tr;
这篇关于使用HTML Agility Pack和Linq解析html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文