无法弄清楚如何使用HTML Agility Pack进行解析 [英] Can't figure how to parse using HTML Agility Pack
本文介绍了无法弄清楚如何使用HTML Agility Pack进行解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下HTML代码块,但我不知道如何获取指定的值
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body >
<form name="form1" method="post" action="" id="form1">
<div>
<table class="tableclass" >
<tbody>
<tr>
<tr>
<td colspan="5" class="myclass1"><span id="myclass2">value1</span></td>
</tr>
<tr id="idvalue" aa="1" class="myclass3a">
<td><a href="" target="_blank">value2</a></td>
<td>value3</td>
<td>value4</td>
<td>value5</td>
<td>value6</td>
</tr>
<tr id="idvalue" aa="2" class="myclass3b">
<td><a href="" target="_blank">value2</a></td>
<td>value3</td>
<td>value4</td>
<td>value5</td>
<td>value6</td>
</tr>
<tr id="idvalue" aa="3" class="myclass3c">
<td><a href="" target="_blank">value2</a></td>
<td>value3</td>
<td>value4</td>
<td>value5</td>
<td>value6</td>
</tr>
</tbody>
</table>
</div>
</form>
</body>
</html>
让我分析一下这段代码.
页面上有一个表格,其中第一行的格式略有不同,我要提取 value1 ,其余各行具有各种类和不同的id值,从每一行到表末尾,我要提取 value2 , value3 , value4 , value5 >
感谢您的时间
解决方案
var doc = new HtmlDocument();
doc.Load(url);
var table = doc.DocumentNode.SelectSingleNode("//table[@class='tableclass']");
var value1 = table.Descendants("tr").Skip(1)
.Select(tr => tr.InnerText.Trim())
.First();
var theRest =
from tr in table.Descendants("tr").Skip(2)
let values = tr.Elements("td")
.Select(td => td.InnerText.Trim())
.ToList()
select new
{
Value2 = values[0],
Value3 = values[1],
Value4 = values[2],
Value5 = values[3],
Value6 = values[4],
};
I have the following chunk of HTML code but i cant figure how i can get the designated values
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body >
<form name="form1" method="post" action="" id="form1">
<div>
<table class="tableclass" >
<tbody>
<tr>
<tr>
<td colspan="5" class="myclass1"><span id="myclass2">value1</span></td>
</tr>
<tr id="idvalue" aa="1" class="myclass3a">
<td><a href="" target="_blank">value2</a></td>
<td>value3</td>
<td>value4</td>
<td>value5</td>
<td>value6</td>
</tr>
<tr id="idvalue" aa="2" class="myclass3b">
<td><a href="" target="_blank">value2</a></td>
<td>value3</td>
<td>value4</td>
<td>value5</td>
<td>value6</td>
</tr>
<tr id="idvalue" aa="3" class="myclass3c">
<td><a href="" target="_blank">value2</a></td>
<td>value3</td>
<td>value4</td>
<td>value5</td>
<td>value6</td>
</tr>
</tbody>
</table>
</div>
</form>
</body>
</html>
Let me analyze a little this code.
The page has a table with the 1st row having a slightly different format where i want to extract the value1 and the rest of the rows that each have an a variety of classes and different id values and from each row until the end of the table i want to extract value2, value3, value4, value5
Thanks for your time
解决方案
var doc = new HtmlDocument();
doc.Load(url);
var table = doc.DocumentNode.SelectSingleNode("//table[@class='tableclass']");
var value1 = table.Descendants("tr").Skip(1)
.Select(tr => tr.InnerText.Trim())
.First();
var theRest =
from tr in table.Descendants("tr").Skip(2)
let values = tr.Elements("td")
.Select(td => td.InnerText.Trim())
.ToList()
select new
{
Value2 = values[0],
Value3 = values[1],
Value4 = values[2],
Value5 = values[3],
Value6 = values[4],
};
这篇关于无法弄清楚如何使用HTML Agility Pack进行解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文