我如何检索和解析从URL返回的HTML? [英] How can I retrieve and parse just the html returned from an URL?

查看:133
本文介绍了我如何检索和解析从URL返回的HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够以编程方式(不显示在浏览器中显示)发送一个URL,例如 http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-别名%3Daps& field-keywords = platypi& sprefix = platypi%2Caps& rh = i%3Aaps%2Ck%3Aplatypi 并返回一个字符串(或更合适的数据类型?页面(有趣的部分,无论如何),以便我可以解析并重新格式化它的选定部分作为匹配的文本和图像(链接到适当的页面)我想用Razor /网页做到这一点,如果这有什么区别。

IOW,这是一个屏幕截图的问题,但真的是一个屏幕后面的抓取。



是否有可能?如何?答案奖励100分将颁发给(或最有帮助的)答案。

解决方案使用 WebClient 类(或.Net 4.5的更好 HttpClient 类)下载HTML,然后使用 HTML AgilityPack 进行解析


I want to be able to programmatically (without it displaying in the browser) send an URL such as http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=platypi&sprefix=platypi%2Caps&rh=i%3Aaps%2Ck%3Aplatypi" and get back in a string (or some more appropriate data type?) the html results of the page (the interesting part, anyway) so that I could parse that and reformat selected parts of it as matched text and images (which link to the appropriate page). I want to do this with Razor/Web Pages, if that makes any difference.

IOW, this is sort of a screen-scraping question, but really a "behind-the-screen" scraping.

Is it possible? How? A 100 point post-answer-bonus will be awarded to the (or the most helpful) answer.

解决方案

Use the WebClient class (or .Net 4.5's better HttpClient class) to download the HTML, then use HTML AgilityPack to parse it

这篇关于我如何检索和解析从URL返回的HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆