如何获取每个节点的href元素和属性? [英] How to get href elements and attributes for each node?

查看:161
本文介绍了如何获取每个节点的href元素和属性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个项目,该项目应阅读html,并查找找到与值匹配的所有节点,然后查找所定位节点的元素和属性. 我很难弄清楚如何获取href属性和元素.

I am working on a project that should read html, and find find all nodes that match a value, then find elements and attributes of the located nodes. I am having difficulty figuring out how to get the href attributes and elements though.

我正在使用HTMLAgilityPack. 我有

I am using HTMLAgilityPack. I have numerous nodes of

class ="middle"

class="middle"

整个html.我需要获取所有这些内容,并从中获取href元素和属性.以下是html的示例:

throughout the html. I need to get all of them, and from them, get the href element and attributes. Below is a sample of the html:

<div class="top">
        <div class="left">            
                <a href="item123">
                    <img src="url.png" border="0" />
                                    </a>
            </div>
        </div>
<div class="middle">
            <div class="title"><a href="item123">Captains Hat</a></div>

                            <div class="day">monday</div>

            <div class="city">Tuscon, AZ | 100 Days | <script typs="text/javascript">document.write(ts_to_age_min(1445620427));</script></div>

</div>

我已经能够获得所需的其他属性,但不能获取"href"的属性. 这是我的代码:

I have been able to get the other attributes I need, but not for 'href'. Here is the code I have:

List<string> listResults = new List<string>();         
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(url);                      

//get each listing                       
foreach (HtmlNode node in doc.DocumentNode.Descendants("div").Where(d =>
                d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("middle")))
            {                
string day = node.SelectSingleNode(".//*[contains(@class,'day')]").InnerHtml; 
string city = node.SelectSingleNode(".//*[contains(@class,'city')]").InnerHtml;
string item = node.SelectSingleNode("//a").Attributes["href"].Value;

listResults.Add(day + EnvironmentNewline 
+ city + EnvironmentNewline 
+ item + EnvironmentNewline + EnvironmentNewline)
}

我上面的代码虽然为我提供了整个html页面的第一个href值,但由于某些原因(通过将列表输出到消息框即可看到)为每个节点提供了href值.我认为在我的foreach循环中,使用SelectSingleNode应该为该特定节点获取第一个href属性.如果是这样,为什么我要为整个HTML页面加载第一个href属性?

My code above though is giving me the first href value for the whole html page though, and is giving it for each node for some reason (visible by outputting the list to a messagebox). I thought being in my foreach loop that using SelectSingleNode should get the first href attribute for that specific node. If so, why am I getting the first href attribute for the whole html page loaded?

关于HTLMAgilityPack获取href值,我已经进行了很多讨论,但我无法使其正常工作.

I've been going through lots of threads on here about getting href values with HTLMAgilityPack, but I haven't been able to get this to work.

如何基于class属性(class ="middle")为我选择的每个节点获取href属性和元素?

How can I get the href attribute and elements for each node I'm selecting based off the class attribute (class="middle")?

推荐答案

尝试替换

 string item = node.SelectSingleNode("//a").Attributes["href"].Value;

 string item = node.SelectSingleNode(".//a").Attributes["href"].Value;

除此之外,上面的代码对我有用.

Other than that, code above works for me.

或者:

string item = node.SelectSingleNode(".//*[contains(@class,'title')]")
              .Descendants("a").FirstOrDefault().Attributes["href"].Value; 

这篇关于如何获取每个节点的href元素和属性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆