使用HTML Agility Pack获取HTML页面上的所有divs ID [英] get all the divs ids on a html page using Html Agility Pack

查看:178
本文介绍了使用HTML Agility Pack获取HTML页面上的所有divs ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用HTML Agility Pack获取HTML页面上的所有div ID.我正在尝试获取所有ID,并将它们放入集合中.

How do I get all the divs ids on a html page using Html Agility Pack. I am trying to get all the ids and put them into a collection.

<p>
    <div class='myclass1'>
        <div id='f'>
        </div>  
        <div id="myclass2">
            <div id="my"><div id="h"></div><div id="b"></div></div>
        </div>
    </div>
</p>

代码:

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); 
htmlDoc.OptionFixNestedTags=true;
htmlDoc.Load(filePath);    
HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("div"); 

如何获取所有div ID的集合?

How do I get collection of all divs ids?

推荐答案

如果只需要ID,则可以获取这些id属性节点的集合,而不必获取div元素节点的集合.例如:

If you just want the ID's, you can get a collection of those id attribute nodes instead of getting a collection of the div element nodes. For instance:

List<string> ids = new List<string>();
foreach(XmlNode node in doc.SelectNodes("//div/@id"))
{
    ids.Add(node.InnerText);
}

这将跳过没有ID的div元素,例如您示例中的<div class='myclass1'>元素.

This will skip the div elements that don't have an ID, such as the <div class='myclass1'> element in your example.

"//div/@id"是XPath字符串. XPath是一项易于学习的技术,如果您要处理大量的XML,或者通过敏捷包库处理HTML(在这种情况下,则是HTML),就很容易掌握. XPath是一种行业标准,可让您选择XML文档中的匹配节点.

"//div/@id" is an XPath string. XPath is a technology which is vary handy to learn if you deal much with XML, or in this case, HTML via the agility pack library. XPath is an industry standard which allows you to select matching nodes in an XML document.

  • //表示您希望它选择以下节点作为当前节点的子代,或其任何后代.由于当前节点是文档的根节点,因此可以在文档中的任何位置找到匹配的节点.
  • div是我们要匹配的元素名称.因此,在这种情况下,我们告诉它可以在文档中的任何位置找到所有div元素.
  • /表示您想要一个子节点.在这种情况下,id属性是div元素的子元素,因此首先我们说我们要使用div元素,然后我们需要使用正斜杠说我们要使用div元素的子节点之一.
  • @id意味着我们要查找所有id属性. @符号表示它是属性名称,而不是元素名称.
  • // means you want it to select the following node as a child of the current node, or in any of its descendants. Since the current node is the root node of the document, this will find matching nodes anywhere in the document.
  • div is an element name we want to match. So, in this case, we are telling it to find all div elements anywhere in the document.
  • / indicates that you want a child node. In this case the id attribute is a child of the div element, so first we say we want the div element, then we need the forward slash to say we want one of the div element's child nodes.
  • @id means we want to find all the id attributes. The @ symbol indicates that it is an attribute name instead of an element name.

这篇关于使用HTML Agility Pack获取HTML页面上的所有divs ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆