获取页面网站的来源 [英] Getting the Page Source of website

查看:63
本文介绍了获取页面网站的来源的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我好像在使用某些网站的代码时遇到了麻烦,例如页面来源



view-source:http://www.booksamillion .com / search?id = 5910205702379& query = hunger + games& where = book_title& search.x = 24& search.y = 9& search = Search& affiliate =& sort = price_ascending



我正在尝试获取该书的链接以及此价格:



 <   div     class   =  meta >  


< span class = title > < a href = http:/ /www.booksamillion.com/p/Hunger-Games-Sparknotes-Literature-Guide/SparkNotes/9781411470989?id=5910205702379\" 标题 = 饥饿游戏Sparknotes文献指南 > < img src = http ://covers2.booksamillion.com/covers/bam/1/41/147/098/1411470982_t.jpg width = 60 alt = 饥饿游戏Sparknotes文献指南 > 饥饿游戏Sparknotes文献指南< / a > (平装本)< / span >

< span class = byline > < a href = search?type = author& query = SparkNotes& id = 5910205702379 title = < span class =code-keyword> SparkNotes > SparkNotes < / a > < a href = search?type = author& query = Suzanne Collins& id = 5910205702379 title = Suzanne Collins > Suzanne Collins < / a >

< br > ISBN 9781411470989/2014年2月< / span >
< br > ; < br >
< span class = 电子书价格 > 在线价格:$ 5.95 < / span >

< ; span class < span class =code-keyword> = 电子书价格 > 市场价格从:$ 6.39 < / span >

< div class = availability_search_results > 有货。< / div >
< / div > <! - end meta - >





这就是我目前的代码:



  string  getPrice =  string  .Empty ; 
string getUrl = string .Empty;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true ;
htmlDoc.LoadHtml(responseData); // load html
HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes( // div [@ class ='meta'] );


foreach (HtmlAgilityPack.HtmlNode节点 in allBookResults)
{
getUrl = node.SelectSingleNode( // span [@ class ='byline'] )。GetAttributeValue( content)的ToString();
HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode( // span [@ class ='ebook-price' ]);

foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
{
getPrice = bookPriceNode.SelectSingleNode( // span [@ class ='ebook-price'] )。GetAttributeValue( content)的ToString();
}
}





似乎我的代码编写得不好,因为我调试时出现null错误。我是否可以获得关于跨度类和属性的不同用途和捕获的小解释,以便我可以大致了解如何从其他网站捕获图书链接和价格?



非常感谢!

解决方案

5.95 < / span >

< span class = 电子书价格 > 市场价格来自:


6.39 < / span >

< div class = availability_search_results > 有货。< / div >
< / div > <! - end meta - >





这就是我目前的代码:



  string  getPrice =  string  .Empty; 
string getUrl = string .Empty;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true ;
htmlDoc.LoadHtml(responseData); // load html
HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes( // div [@ class ='meta'] );


foreach (HtmlAgilityPack.HtmlNode节点 in allBookResults)
{
getUrl = node.SelectSingleNode( // span [@ class ='byline'] )。GetAttributeValue( content)的ToString();
HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode( // span [@ class ='ebook-price' ]);

foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
{
getPrice = bookPriceNode.SelectSingleNode( // span [@ class ='ebook-price'] )。GetAttributeValue( content)的ToString();
}
}





似乎我的代码编写得不好,因为我调试时出现null错误。我是否可以获得关于跨度类和属性的不同用途和捕获的小解释,以便我可以大致了解如何从其他网站捕获图书链接和价格?



非常感谢!


我不知道你想要做什么来获取HTML文档的来源。您只需按原样下载它,无需任何渲染或类似的东西。您可以使用类 System.Net.WebClient ,或者更好的是, System.Net.HttpWebRequest

http:// msdn。 microsoft.com/en-us/library/system.net.webclient%28v=vs.110%29.aspx [ ^ ],

http://msdn.microsoft.com/en-us/library/system。 net.httpwebrequest%28v = vs.110%29.aspx [ ^ ]。



-SA

Hi, I seem to be having trouble using the code of some websites for example the page source at

view-source:http://www.booksamillion.com/search?id=5910205702379&query=hunger+games&where=book_title&search.x=24&search.y=9&search=Search&affiliate=&sort=price_ascending

I am trying to get the Link of the book as well as the price from this:

<div class="meta">


        <span class="title"><a href="http://www.booksamillion.com/p/Hunger-Games-Sparknotes-Literature-Guide/SparkNotes/9781411470989?id=5910205702379" title="The Hunger Games Sparknotes Literature Guide"><img src="http://covers2.booksamillion.com/covers/bam/1/41/147/098/1411470982_t.jpg" width="60" alt="The Hunger Games Sparknotes Literature Guide">The Hunger Games Sparknotes Literature Guide</a> (Paperback)</span>

        <span class="byline">by <a href="search?type=author&query=SparkNotes&id=5910205702379" title="SparkNotes">SparkNotes</a>, <a href="search?type=author&query=Suzanne Collins&id=5910205702379" title="Suzanne Collins">Suzanne Collins</a>

        <br>ISBN 9781411470989 / February 2014</span>
        <br><br>
<span class="ebook-price">Online Price: $5.95</span>

<span class="ebook-price">Marketplace Price from: $6.39</span>

        <div class="availability_search_results">In Stock.</div>
    </div><!-- end meta -->



This is what I have as code at the moment:

string getPrice = string.Empty;
       string getUrl = string.Empty;
       HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
       htmlDoc.OptionFixNestedTags = true;
       htmlDoc.LoadHtml(responseData); // load html
       HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
       HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes("//div[@class='meta']");


       foreach (HtmlAgilityPack.HtmlNode node in allBookResults)
       {
           getUrl = node.SelectSingleNode("//span[@class='byline']").GetAttributeValue("content", null).ToString();
           HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode("//span[@class='ebook-price']");

           foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
           {
               getPrice = bookPriceNode.SelectSingleNode("//span[@class='ebook-price']").GetAttributeValue("content", null).ToString();
           }
       }



It seems that the code I have is not properly written since I am getting a null error when debugging. Could I get a small explanation on the different uses and capture of span classes and property so I can get a rough idea on how to capture the book link and price from other websites aswell?

Thanks a bunch!

解决方案

5.95</span> <span class="ebook-price">Marketplace Price from:


6.39</span> <div class="availability_search_results">In Stock.</div> </div><!-- end meta -->



This is what I have as code at the moment:

string getPrice = string.Empty;
       string getUrl = string.Empty;
       HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
       htmlDoc.OptionFixNestedTags = true;
       htmlDoc.LoadHtml(responseData); // load html
       HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
       HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes("//div[@class='meta']");


       foreach (HtmlAgilityPack.HtmlNode node in allBookResults)
       {
           getUrl = node.SelectSingleNode("//span[@class='byline']").GetAttributeValue("content", null).ToString();
           HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode("//span[@class='ebook-price']");

           foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
           {
               getPrice = bookPriceNode.SelectSingleNode("//span[@class='ebook-price']").GetAttributeValue("content", null).ToString();
           }
       }



It seems that the code I have is not properly written since I am getting a null error when debugging. Could I get a small explanation on the different uses and capture of span classes and property so I can get a rough idea on how to capture the book link and price from other websites aswell?

Thanks a bunch!


I have no idea what are you trying to do to get a source of HTML document. All you need is to download it as is, without any rendering or anything like that. You can use either the class System.Net.WebClient or, even better, System.Net.HttpWebRequest:
http://msdn.microsoft.com/en-us/library/system.net.webclient%28v=vs.110%29.aspx[^],
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest%28v=vs.110%29.aspx[^].

—SA


这篇关于获取页面网站的来源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆