使用HtmlAgillityPack解析HTML阅读选项标签内容 [英] Parsing HTML Reading Option Tag Content with HtmlAgillityPack

查看:222
本文介绍了使用HtmlAgillityPack解析HTML阅读选项标签内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



示例HTML文档:



pre> < TR>
< td class =css_lokalitacolspan =4>
< select id =regionname =region>
< option value =0selected>Všetkyregiony< / option>
< optgroup>Banskobystrickýkraj< / optgroup>
< option value =k_1style =color:#000000; font-weight:bold;>Banskobystrickýkraj< / option>
< option value =1>& nbsp;& nbsp;& nbsp;BanskáBystrica< / option>



< option value =174>& nbsp;& nbsp;& nbsp; CZ - Ústeckýkraj< / option>
< option value =175>& nbsp;& nbsp;& nbsp; CZ - Zlínskýkraj< / option>
< / select>
< / td>
< / tr>

< tr>
< td class =css_sfotkoucolspan =4>
< input type =checkboxname =fotovalue =1id =foto/>
< label for =foto> Ibapoužívatelias fotkou< / label>
< / td>
< / tr>

< tr>
< td class =css_miestnostcolspan =4>
< select name =akt-miestid =onoffaci>
< option value =a_0>Všetci< / option>



< optgroup label =Záľubyazáujmy>
< option value =m_1419307>& nbsp;& nbsp;& nbsp; BezLásky< / option>



< option value =m_1108016>& nbsp;& nbsp;& nbsp; Drum N Bass< / option>
< / optgroup>
< / select>
< / td>
< / tr>

我需要从< select name =akt-miest例如:

 

< option value =** a_0 **> **Všetci**< / option>

我需要取值 ** a_0 ** 和文本 **Všetci**



所以我尝试首先通过Id选择访问权限:

  var selectNode = htmlDoc.GetElementbyId(onoffaci); 

然后用Xpath选择所有选项节点。

  var nodes = selectNode.SelectNodes(// option); 

并获取值:

  foreach(节点中的节点)
{
string roomName = node.NextSibling.InnerText;
string roomId = node.Attributes [value]。Value;
rooms.Add(新房间{RoomId = roomId,RoomName = roomName});
}

但是我从另一个select中获取值(<选择id =regionname =region> )这个选择位于html代码的顶部。 b



我应用Darin Dimitrov的建议试试这个:

  HtmlNode selectNode = htmlDoc.GetElementbyId(onoffaci); 

var nodes = selectNode.SelectNodes(option);

foreach(节点中的var节点)
{
string roomName = node.NextSibling.InnerText;
string roomId = node.Attributes [value]。Value;
rooms.Add(新房间{RoomId = roomId,RoomName = roomName});
}

退房;

我只解析前三个选项元素,因为我认为问题在于select组合



标签。

 < select name =akt-miestid = onoffaci > 
< option value =a_0>Všetci< / option>
< option value =a_1> Ibaprihlásení< / option>
< option value =a_5selected =selected> Teraz na Pokeci< / option>
< optgroup label =Hlavnémiestnosti>
< option value =m_13>& nbsp;& nbsp;& nbsp; Bezzáväzkov< / option>
< option value =m_9>& nbsp;& nbsp;& nbsp; Do pohody< / option>
< option value =m_39>& nbsp;& nbsp;& nbsp;Dámskyklub< / option>
< / optgroup>



我尝试使用此选择所有以下节点

  var nodes = selectNode.SelectNodes(option :: *); 

但是我得到这个错误: xpath有一个无效的标记。 $ b

我想访问selectNode的所有子元素:

  HtmlNode selectNode = htmlDoc.GetElementbyId(onoffaci); 

编辑#2:

它所有的HTML文件,我需要解析选项标签。



http://hotfile.com/dl/98442053/577b556/source.html

解决方案

默认情况下,Html Agility Pack会将< OPTION> 标签视为空,这意味着它不需要关闭 < / OPTION> 。在这种情况下,结束标签被丢弃。您可以使用 HtmlNode.ElementFlags 集合更改此行为。



这是一个应该做你想做的事的代码:

  HtmlDocument doc = new HtmlDocument(); 
HtmlNode.ElementsFlags.Remove(option);
doc.LoadHtml(yourHtml);
$ b foreach(doc.documentNode.SelectNodes中的HtmlNode节点(//选择[@ id ='onoffaci'] //选项))
{
Console.WriteLine Value =+ node.Attributes [value]。Value);
Console.WriteLine(InnerText =+ node.InnerText);
Console.WriteLine();
}


I am trying to use HtmlAgilityPack to parse HTML, but am having problems.

Sample HTML Doc:

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>Všetky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">Všetci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

I need parse value from <select name="akt-miest" id="onoffaci">

For example:

<option value="**a_0**">**Všetci**</option>

I need get value **a_0** and text **Všetci**.

So I try first access to select by Id:

var selectNode = htmlDoc.GetElementbyId("onoffaci");

Then with Xpath select all option node.

var nodes = selectNode.SelectNodes("//option");

And get values:

foreach (var node in nodes)
{
    string roomName = node.NextSibling.InnerText;
    string roomId = node.Attributes["value"].Value;
    rooms.Add(new Room { RoomId = roomId, RoomName = roomName });
}

But I get values from another select (<select id="region" name="region">) this select is on the top of html code.

EDITED:

I apply advice of Darin Dimitrov an try this:

HtmlNode selectNode = htmlDoc.GetElementbyId("onoffaci");

var nodes = selectNode.SelectNodes("option");

foreach (var node in nodes)
{
    string roomName = node.NextSibling.InnerText;
    string roomId = node.Attributes["value"].Value;
    rooms.Add(new Room { RoomId = roomId, RoomName = roomName });
}

return rooms;

I parse only first three option element, because I think the problem is that select consist

optgroup tag.

<select name="akt-miest" id="onoffaci">
  <option value="a_0">Všetci</option>
  <option value="a_1">Iba prihlásení</option>
  <option value="a_5" selected="selected">Teraz na Pokeci</option>
  <optgroup label="Hlavné miestnosti">
    <option value="m_13">&nbsp;&nbsp;&nbsp;Bez záväzkov</option>
    <option value="m_9">&nbsp;&nbsp;&nbsp;Do pohody</option>
    <option value="m_39">&nbsp;&nbsp;&nbsp;Dámsky klub</option>
  </optgroup>
  .
  .
  .

I try select all following node with this

var nodes = selectNode.SelectNodes("option::*");

But I get this error: xpath has an invalid token.

I would like access to all childs of selectNode:

HtmlNode selectNode = htmlDoc.GetElementbyId("onoffaci");

EDIT #2:

Here is it all html file, from which I need parse option tags.

http://hotfile.com/dl/98442053/577b556/source.html

解决方案

By default, the <OPTION> tag is treated by Html Agility Pack as "Empty", which means it does not need a closing </OPTION>. In this case, the closing tag is discarded. You can change this behavior using the HtmlNode.ElementFlags collection.

Here is a code that should do what you want:

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("option");
doc.LoadHtml(yourHtml);

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='onoffaci']//option"))
{
    Console.WriteLine("Value=" + node.Attributes["value"].Value);
    Console.WriteLine("InnerText=" + node.InnerText);
    Console.WriteLine();
}

这篇关于使用HtmlAgillityPack解析HTML阅读选项标签内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆