使用 HtmlAgillityPack 解析 HTML 阅读选项标签内容 [英] Parsing HTML Reading Option Tag Content with HtmlAgillityPack
问题描述
我正在尝试使用 HtmlAgilityPack 来解析 HTML,但遇到了问题.
I am trying to use HtmlAgilityPack to parse HTML, but am having problems.
示例 HTML 文档:
Sample HTML Doc:
<tr>
<td class="css_lokalita" colspan="4">
<select id="region" name="region">
<option value="0" selected>Všetky regiony</option>
<optgroup>Banskobystrický kraj</optgroup>
<option value="k_1" style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
<option value="1"> Banská Bystrica</option>
.
.
.
<option value="174"> CZ - Ústecký kraj</option>
<option value="175"> CZ - Zlínský kraj</option>
</select>
</td>
</tr>
<tr>
<td class="css_sfotkou" colspan="4">
<input type="checkbox" name="foto" value="1" id="foto" />
<label for="foto">Iba používatelia s fotkou</label>
</td>
</tr>
<tr>
<td class="css_miestnost" colspan="4">
<select name="akt-miest" id="onoffaci">
<option value="a_0">Všetci</option>
.
.
.
<optgroup label="Záľuby a záujmy">
<option value="m_1419307"> Bez Lásky</option>
.
.
.
<option value="m_1108016"> Drum N Bass</option>
</optgroup>
</select>
</td>
</tr>
我需要从
例如:
<option value="**a_0**">**Všetci**</option>
我需要获取值 **a_0**
和文本 **Všetci**
.
I need get value **a_0**
and text **Všetci**
.
所以我尝试首先访问以通过 Id 进行选择:
So I try first access to select by Id:
var selectNode = htmlDoc.GetElementbyId("onoffaci");
然后使用 Xpath 选择所有选项节点.
Then with Xpath select all option node.
var nodes = selectNode.SelectNodes("//option");
并获取值:
foreach (var node in nodes)
{
string roomName = node.NextSibling.InnerText;
string roomId = node.Attributes["value"].Value;
rooms.Add(new Room { RoomId = roomId, RoomName = roomName });
}
但是我从另一个选择(<select id="region" name="region">
)中获取值,此选择位于 html 代码的顶部.
But I get values from another select (<select id="region" name="region">
) this select is on the top of html code.
已
我应用了 Darin Dimitrov 的建议并尝试了这个:
I apply advice of Darin Dimitrov an try this:
HtmlNode selectNode = htmlDoc.GetElementbyId("onoffaci");
var nodes = selectNode.SelectNodes("option");
foreach (var node in nodes)
{
string roomName = node.NextSibling.InnerText;
string roomId = node.Attributes["value"].Value;
rooms.Add(new Room { RoomId = roomId, RoomName = roomName });
}
return rooms;
我只解析了前三个选项元素,因为我认为问题在于选择包含
I parse only first three option element, because I think the problem is that select consist
optgroup 标签.
optgroup tag.
<select name="akt-miest" id="onoffaci">
<option value="a_0">Všetci</option>
<option value="a_1">Iba prihlásení</option>
<option value="a_5" selected="selected">Teraz na Pokeci</option>
<optgroup label="Hlavné miestnosti">
<option value="m_13"> Bez záväzkov</option>
<option value="m_9"> Do pohody</option>
<option value="m_39"> Dámsky klub</option>
</optgroup>
.
.
.
我尝试使用此选择所有以下节点
I try select all following node with this
var nodes = selectNode.SelectNodes("option::*");
但我收到此错误:xpath has an invalid token.
我想访问 selectNode 的所有子节点:
I would like access to all childs of selectNode:
HtmlNode selectNode = htmlDoc.GetElementbyId("onoffaci");
编辑#2:
这里是所有的 html 文件,我需要从中解析选项标签.
Here is it all html file, from which I need parse option tags.
http://hotfile.com/dl/98442053/577b556/source.html
推荐答案
默认情况下, 标签被 Html Agility Pack 视为Empty",这意味着它不需要结束
.在这种情况下,结束标记被丢弃.您可以使用
HtmlNode.ElementFlags
集合更改此行为.
By default, the <OPTION>
tag is treated by Html Agility Pack as "Empty", which means it does not need a closing </OPTION>
. In this case, the closing tag is discarded. You can change this behavior using the HtmlNode.ElementFlags
collection.
这是一个应该做你想做的代码:
Here is a code that should do what you want:
HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("option");
doc.LoadHtml(yourHtml);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='onoffaci']//option"))
{
Console.WriteLine("Value=" + node.Attributes["value"].Value);
Console.WriteLine("InnerText=" + node.InnerText);
Console.WriteLine();
}
这篇关于使用 HtmlAgillityPack 解析 HTML 阅读选项标签内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!