获取页面网站的来源 [英] Getting the Page Source of website
问题描述
我好像在使用某些网站的代码时遇到了麻烦,例如页面来源
view-source:http://www.booksamillion .com / search?id = 5910205702379& query = hunger + games& where = book_title& search.x = 24& search.y = 9& search = Search& affiliate =& sort = price_ascending
我正在尝试获取该书的链接以及此价格:
< div class = meta >
< span class = title > < a href = http:/ /www.booksamillion.com/p/Hunger-Games-Sparknotes-Literature-Guide/SparkNotes/9781411470989?id=5910205702379\" 标题 = 饥饿游戏Sparknotes文献指南 > < img src = http ://covers2.booksamillion.com/covers/bam/1/41/147/098/1411470982_t.jpg width = 60 alt = 饥饿游戏Sparknotes文献指南 > 饥饿游戏Sparknotes文献指南< / a > (平装本)< / span >
< span class = byline > < a href = search?type = author& query = SparkNotes& id = 5910205702379 title = < span class =code-keyword> SparkNotes > SparkNotes < / a > ,< a href = search?type = author& query = Suzanne Collins& id = 5910205702379 title = Suzanne Collins > Suzanne Collins < / a >
< br > ISBN 9781411470989/2014年2月< / span >
< br > ; < br >
< span class = 电子书价格 > 在线价格:$ 5.95 < / span >
< ; span class < span class =code-keyword> = 电子书价格 > 市场价格从:$ 6.39 < / span >
< div class = availability_search_results > 有货。< / div >
< / div > <! - end meta - >
这就是我目前的代码:
string getPrice = string .Empty ;
string getUrl = string .Empty;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true ;
htmlDoc.LoadHtml(responseData); // load html
HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes( // div [@ class ='meta'] 跨度>);
foreach (HtmlAgilityPack.HtmlNode节点 in allBookResults)
{
getUrl = node.SelectSingleNode( // span [@ class ='byline'] )。GetAttributeValue( content,空跨度>)的ToString();
HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode( // span [@ class ='ebook-price' ]跨度>);
foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
{
getPrice = bookPriceNode.SelectSingleNode( // span [@ class ='ebook-price'] )。GetAttributeValue( content,空跨度>)的ToString();
}
}
似乎我的代码编写得不好,因为我调试时出现null错误。我是否可以获得关于跨度类和属性的不同用途和捕获的小解释,以便我可以大致了解如何从其他网站捕获图书链接和价格?
非常感谢!
5.95 < / span >
< span class = 电子书价格 > 市场价格来自:
6.39 < / span >
< div class = availability_search_results > 有货。< / div >
< / div > <! - end meta - >
这就是我目前的代码:
string getPrice = string .Empty;
string getUrl = string .Empty;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true ;
htmlDoc.LoadHtml(responseData); // load html
HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes( // div [@ class ='meta'] 跨度>);
foreach (HtmlAgilityPack.HtmlNode节点 in allBookResults)
{
getUrl = node.SelectSingleNode( // span [@ class ='byline'] )。GetAttributeValue( content,空跨度>)的ToString();
HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode( // span [@ class ='ebook-price' ]跨度>);
foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
{
getPrice = bookPriceNode.SelectSingleNode( // span [@ class ='ebook-price'] )。GetAttributeValue( content,空跨度>)的ToString();
}
}
似乎我的代码编写得不好,因为我调试时出现null错误。我是否可以获得关于跨度类和属性的不同用途和捕获的小解释,以便我可以大致了解如何从其他网站捕获图书链接和价格?
非常感谢!
我不知道你想要做什么来获取HTML文档的来源。您只需按原样下载它,无需任何渲染或类似的东西。您可以使用类System.Net.WebClient
,或者更好的是,System.Net.HttpWebRequest
:
http:// msdn。 microsoft.com/en-us/library/system.net.webclient%28v=vs.110%29.aspx [ ^ ],
http://msdn.microsoft.com/en-us/library/system。 net.httpwebrequest%28v = vs.110%29.aspx [ ^ ]。
-SA
Hi, I seem to be having trouble using the code of some websites for example the page source at
view-source:http://www.booksamillion.com/search?id=5910205702379&query=hunger+games&where=book_title&search.x=24&search.y=9&search=Search&affiliate=&sort=price_ascending
I am trying to get the Link of the book as well as the price from this:
<div class="meta">
<span class="title"><a href="http://www.booksamillion.com/p/Hunger-Games-Sparknotes-Literature-Guide/SparkNotes/9781411470989?id=5910205702379" title="The Hunger Games Sparknotes Literature Guide"><img src="http://covers2.booksamillion.com/covers/bam/1/41/147/098/1411470982_t.jpg" width="60" alt="The Hunger Games Sparknotes Literature Guide">The Hunger Games Sparknotes Literature Guide</a> (Paperback)</span>
<span class="byline">by <a href="search?type=author&query=SparkNotes&id=5910205702379" title="SparkNotes">SparkNotes</a>, <a href="search?type=author&query=Suzanne Collins&id=5910205702379" title="Suzanne Collins">Suzanne Collins</a>
<br>ISBN 9781411470989 / February 2014</span>
<br><br>
<span class="ebook-price">Online Price: $5.95</span>
<span class="ebook-price">Marketplace Price from: $6.39</span>
<div class="availability_search_results">In Stock.</div>
</div><!-- end meta -->
This is what I have as code at the moment:
string getPrice = string.Empty;
string getUrl = string.Empty;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.OptionFixNestedTags = true;
htmlDoc.LoadHtml(responseData); // load html
HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode;
HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes("//div[@class='meta']");
foreach (HtmlAgilityPack.HtmlNode node in allBookResults)
{
getUrl = node.SelectSingleNode("//span[@class='byline']").GetAttributeValue("content", null).ToString();
HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode("//span[@class='ebook-price']");
foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes)
{
getPrice = bookPriceNode.SelectSingleNode("//span[@class='ebook-price']").GetAttributeValue("content", null).ToString();
}
}
It seems that the code I have is not properly written since I am getting a null error when debugging. Could I get a small explanation on the different uses and capture of span classes and property so I can get a rough idea on how to capture the book link and price from other websites aswell?
Thanks a bunch!
5.95</span> <span class="ebook-price">Marketplace Price from:
6.39</span> <div class="availability_search_results">In Stock.</div> </div><!-- end meta -->
This is what I have as code at the moment:
string getPrice = string.Empty; string getUrl = string.Empty; HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); htmlDoc.OptionFixNestedTags = true; htmlDoc.LoadHtml(responseData); // load html HtmlAgilityPack.HtmlNode rootNode = htmlDoc.DocumentNode; HtmlAgilityPack.HtmlNodeCollection allBookResults = rootNode.SelectNodes("//div[@class='meta']"); foreach (HtmlAgilityPack.HtmlNode node in allBookResults) { getUrl = node.SelectSingleNode("//span[@class='byline']").GetAttributeValue("content", null).ToString(); HtmlAgilityPack.HtmlNode dataNode = node.SelectSingleNode("//span[@class='ebook-price']"); foreach (HtmlAgilityPack.HtmlNode bookPriceNode in dataNode.ChildNodes) { getPrice = bookPriceNode.SelectSingleNode("//span[@class='ebook-price']").GetAttributeValue("content", null).ToString(); } }
It seems that the code I have is not properly written since I am getting a null error when debugging. Could I get a small explanation on the different uses and capture of span classes and property so I can get a rough idea on how to capture the book link and price from other websites aswell?
Thanks a bunch!
I have no idea what are you trying to do to get a source of HTML document. All you need is to download it as is, without any rendering or anything like that. You can use either the classSystem.Net.WebClient
or, even better,System.Net.HttpWebRequest
:
http://msdn.microsoft.com/en-us/library/system.net.webclient%28v=vs.110%29.aspx[^],
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest%28v=vs.110%29.aspx[^].
—SA
这篇关于获取页面网站的来源的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!