htmlagilitypack解析链接和内部文本 [英] htmlagilitypack parsing links and inner text
问题描述
我是htmlagilitypack的新手,我试图找出一种方法,可以从这样的HTML设置中获取链接
I am new to the htmlagilitypack, I am try figure out a way which I will be able to get the links from a HTML set up like this
<div class="std"><div style="border-right: 1px solid #CCCCCC; float: left; height: 590px; width: 190px;"><div style="background-color: #eae3db; padding: 8px 0 8px 20px; font-weight: bold; font-size: 13px;">test</div>
<div>
<div style="font-weight: bold; margin: 5px 0 -6px;">FEATURED</div>
<span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat1</span></a></span>
<span class="widget widget-category-link"><a href="http://www.href1.com"><span>cat2</span></a></span>
</div></div>
我还没有用C#编写任何代码,但是我想知道是否有人可以建议在没有HTML ID的情况下应该指向哪个标签来获取链接和内部文本.谢谢
I have not wrote any code yet in c# but I was wondering whether anyone could advise what tags should point at to get the links and inner text when there are no HTML ID'. Thanks
推荐答案
如果您熟悉 XPATH ,您将能够浏览html的元素和属性以获取所需的内容.要获取上面的每个href,您可以编写如下代码:
If you are familiar with XPATH you will be able to navigate through the elements and attributes of the html to get whatever you want. To get each href in the above you could write code as follows:
const string xpath = "/div//span/a";
//WebPage below is a string that contains the text of your example
HtmlNode html = HtmlNode.CreateNode(WebPage);
//The following gives you a node collection of your two <a> elements
HtmlNodeCollection items = html.SelectNodes(xpath);
foreach (HtmlNode a in items)
{
if (a.Attributes.Contains("href"))
//Get your value here
{
yourValue = a.Attributes["href"].Value
}
}
注意:我尚未运行或测试此代码
Note: I have not run or tested this code
这篇关于htmlagilitypack解析链接和内部文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!