获取页面网站的来源 [英] Getting the Page Source of website

查看：63 发布时间：2019/6/14 22:18:58 C# Visual-Studio HTML HTTP

本文介绍了获取页面网站的来源的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我好像在使用某些网站的代码时遇到了麻烦，例如页面来源

view-source：http：//www.booksamillion .com / search？id = 5910205702379& query = hunger + games& where = book_title& search.x = 24& search.y = 9& search = Search& affiliate =& sort = price_ascending

我正在尝试获取该书的链接以及此价格：

 <   div     class   =  meta >  
 
 
 <   span     class   =  title >  <   a     href   =  http：/ /www.booksamillion.com/p/Hunger-Games-Sparknotes-Literature-Guide/SparkNotes/9781411470989?id=5910205702379\"  标题  = 饥饿游戏Sparknotes文献指南 >  <   img     src   =  http ：//covers2.booksamillion.com/covers/bam/1/41/147/098/1411470982_t.jpg    width   =  60    alt   = 饥饿游戏Sparknotes文献指南  > 饥饿游戏Sparknotes文献指南<   / a  > （平装本）<   / span  >  
 
 <   span     class   =  byline >  <   a     href   =  search？type = author& query = SparkNotes& id = 5910205702379     title   = < span class =code-keyword> SparkNotes >  SparkNotes <   / a  > ，<   a     href   =  search？type = author& query = Suzanne Collins& id = 5910205702379    title   =  Suzanne Collins >  Suzanne Collins <   / a  >  
 
 <   br  >  ISBN 9781411470989/2014年2月<   / span  >  
 <   br  > ;  <   br  >  
 <   span     class   = 电子书价格 > 在线价格：$ 5.95 <   / span  >  
 
 < ;   span     class  < span class =code-keyword> = 电子书价格 > 市场价格从：$ 6.39 <   / span  >  
 
 <   div     class   =  availability_search_results > 有货。<   / div  >  
 <   / div  >  <！ -     end meta    - >

这就是我目前的代码：

  string  getPrice =  string  .Empty ; 
  string  getUrl =  string  .Empty; 
 HtmlAgilityPack.HtmlDocument htmlDoc =  new  HtmlAgilityPack.HtmlDocument（）; 
 htmlDoc.OptionFixNestedTags =  true ; 
 htmlDoc.LoadHtml（responseData）;  //   load html  
 HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode; 
 HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes（  // div [@ class ='meta'] ）; 
 
 
  foreach （HtmlAgilityPack.HtmlNode节点 in  allBookResults）
 {
 getUrl = node.SelectSingleNode（  // span [@ class ='byline'] ）。GetAttributeValue（  content，空）的ToString（）; 
 HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode（  // span [@ class ='ebook-price' ]）; 
 
  foreach （HtmlAgilityPack.HtmlNode bookPriceNode  in  dataNode.ChildNodes）
 {
 getPrice = bookPriceNode.SelectSingleNode（  // span [@ class ='ebook-price'] ）。GetAttributeValue（  content，空）的ToString（）; 
} 
}

似乎我的代码编写得不好，因为我调试时出现null错误。我是否可以获得关于跨度类和属性的不同用途和捕获的小解释，以便我可以大致了解如何从其他网站捕获图书链接和价格？

非常感谢！

解决方案

5.95 < / span >

< span class = 电子书价格 > 市场价格来自：

6.39 < / span >

< div class = availability_search_results > 有货。< / div >
< / div > <！ - end meta - >

这就是我目前的代码：

  string  getPrice =  string  .Empty; 
  string  getUrl =  string  .Empty; 
 HtmlAgilityPack.HtmlDocument htmlDoc =  new  HtmlAgilityPack.HtmlDocument（）; 
 htmlDoc.OptionFixNestedTags =  true ; 
 htmlDoc.LoadHtml（responseData）;  //   load html  
 HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode; 
 HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes（  // div [@ class ='meta'] ）; 
 
 
  foreach （HtmlAgilityPack.HtmlNode节点 in  allBookResults）
 {
 getUrl = node.SelectSingleNode（  // span [@ class ='byline'] ）。GetAttributeValue（  content，空）的ToString（）; 
 HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode（  // span [@ class ='ebook-price' ]）; 
 
  foreach （HtmlAgilityPack.HtmlNode bookPriceNode  in  dataNode.ChildNodes）
 {
 getPrice = bookPriceNode.SelectSingleNode（  // span [@ class ='ebook-price'] ）。GetAttributeValue（  content，空）的ToString（）; 
} 
}

我不知道你想要做什么来获取HTML文档的来源。您只需按原样下载它，无需任何渲染或类似的东西。您可以使用类 System.Net.WebClient ，或者更好的是， System.Net.HttpWebRequest ：

http：// msdn。 microsoft.com/en-us/library/system.net.webclient%28v=vs.110%29.aspx [ ^ ]，

http://msdn.microsoft.com/en-us/library/system。 net.httpwebrequest％28v = vs.110％29.aspx [ ^ ]。

-SA

Hi, I seem to be having trouble using the code of some websites for example the page source at

view-source:http://www.booksamillion.com/search?id=5910205702379&query=hunger+games&where=book_title&search.x=24&search.y=9&search=Search&affiliate=&sort=price_ascending

I am trying to get the Link of the book as well as the price from this:

<div class="meta">


        <span class="title"><a href="http://www.booksamillion.com/p/Hunger-Games-Sparknotes-Literature-Guide/SparkNotes/9781411470989?id=5910205702379" title="The Hunger Games Sparknotes Literature Guide"><img src="http://covers2.booksamillion.com/covers/bam/1/41/147/098/1411470982_t.jpg" width="60" alt="The Hunger Games Sparknotes Literature Guide">The Hunger Games Sparknotes Literature Guide</a> (Paperback)</span>

        <span class="byline">by <a href="search?type=author&query=SparkNotes&id=5910205702379" title="SparkNotes">SparkNotes</a>, <a href="search?type=author&query=Suzanne Collins&id=5910205702379" title="Suzanne Collins">Suzanne Collins</a>

        <br>ISBN 9781411470989 / February 2014</span>
        <br><br>
<span class="ebook-price">Online Price: $5.95</span>

<span class="ebook-price">Marketplace Price from: $6.39</span>

        <div class="availability_search_results">In Stock.</div>
    </div><!-- end meta -->

This is what I have as code at the moment:

string getPrice = string.Empty;
       string getUrl = string.Empty;
       HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
       htmlDoc.OptionFixNestedTags = true;
       htmlDoc.LoadHtml(responseData); // load html
       HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
       HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes("//div[@class='meta']");


       foreach (HtmlAgilityPack.HtmlNode node in allBookResults)
       {
           getUrl = node.SelectSingleNode("//span[@class='byline']").GetAttributeValue("content", null).ToString();
           HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode("//span[@class='ebook-price']");

           foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
           {
               getPrice = bookPriceNode.SelectSingleNode("//span[@class='ebook-price']").GetAttributeValue("content", null).ToString();
           }
       }

It seems that the code I have is not properly written since I am getting a null error when debugging. Could I get a small explanation on the different uses and capture of span classes and property so I can get a rough idea on how to capture the book link and price from other websites aswell?

Thanks a bunch!

解决方案

5.95</span> <span class="ebook-price">Marketplace Price from:

6.39</span> <div class="availability_search_results">In Stock.</div> </div>

This is what I have as code at the moment:

string getPrice = string.Empty;
       string getUrl = string.Empty;
       HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
       htmlDoc.OptionFixNestedTags = true;
       htmlDoc.LoadHtml(responseData); // load html
       HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
       HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes("//div[@class='meta']");


       foreach (HtmlAgilityPack.HtmlNode node in allBookResults)
       {
           getUrl = node.SelectSingleNode("//span[@class='byline']").GetAttributeValue("content", null).ToString();
           HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode("//span[@class='ebook-price']");

           foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
           {
               getPrice = bookPriceNode.SelectSingleNode("//span[@class='ebook-price']").GetAttributeValue("content", null).ToString();
           }
       }

I have no idea what are you trying to do to get a source of HTML document. All you need is to download it as is, without any rendering or anything like that. You can use either the class System.Net.WebClient or, even better, System.Net.HttpWebRequest:
http://msdn.microsoft.com/en-us/library/system.net.webclient%28v=vs.110%29.aspx[^],
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest%28v=vs.110%29.aspx[^].

—SA

这篇关于获取页面网站的来源的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取页面网站的来源 [英] Getting the Page Source of website

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

获取页面网站的来源 [英] Getting the Page Source of website

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭