htmlagilitypack解析链接和内部文本 [英] htmlagilitypack parsing links and inner text

查看:85
本文介绍了htmlagilitypack解析链接和内部文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是htmlagilitypack的新手,我试图找出一种方法,可以从这样的HTML设置中获取链接

I am new to the htmlagilitypack, I am try figure out a way which I will be able to get the links from a HTML set up like this

<div class="std"><div style="border-right: 1px solid #CCCCCC; float: left; height: 590px; width: 190px;"><div style="background-color: #eae3db; padding: 8px 0 8px  20px; font-weight: bold; font-size: 13px;">test</div>
    <div>
    <div style="font-weight: bold; margin: 5px 0 -6px;">FEATURED</div>
    <span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat1</span></a></span>
     <span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat2</span></a></span>
</div></div>

我还没有用C#编写任何代码,但是我想知道是否有人可以建议在没有HTML ID的情况下应该指向哪个标签来获取链接和内部文本.谢谢

I have not wrote any code yet in c# but I was wondering whether anyone could advise what tags should point at to get the links and inner text when there are no HTML ID'. Thanks

推荐答案

如果您熟悉 XPATH ,您将能够浏览html的元素和属性以获取所需的内容.要获取上面的每个href,您可以编写如下代码:

If you are familiar with XPATH you will be able to navigate through the elements and attributes of the html to get whatever you want. To get each href in the above you could write code as follows:

 const string xpath = "/div//span/a";

 //WebPage below is a string that contains the text of your example
 HtmlNode html = HtmlNode.CreateNode(WebPage);
 //The following gives you a node collection of your two <a> elements
 HtmlNodeCollection items = html.SelectNodes(xpath);
 foreach (HtmlNode a in items)
 {    
      if (a.Attributes.Contains("href"))
      //Get your value here
      {
           yourValue = a.Attributes["href"].Value
      }
 }

注意:我尚未运行或测试此代码

Note: I have not run or tested this code

这篇关于htmlagilitypack解析链接和内部文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆