如何从ASP.NET获取网页的HTML内容 [英] How to get HTML content of webpage from ASP.NET

查看:273
本文介绍了如何从ASP.NET获取网页的HTML内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从动态网页中抓取一些内容(似乎是在MVC中开发的).

I would like to scrape some contents from a dynamic web page (seems it is developed in MVC).

数据抓取逻辑是通过HTML敏捷性完成的,但现在的问题是, 从浏览器请求URL时返回的HTML与来自ASP.NET Web请求的URL的网络响应不同.

Data scraping logics are done with HTML agility, but now the issue is, HTML returned while requesting for URL from browser and web response of the URL from ASP.NET web request is different.

主要是浏览器响应包含我需要的动态数据(根据查询字符串中传递的值进行渲染),但是WebResponse结果不同.

Mainly browser response has dynamic data I need (renders based on the value passed in query string), but the WebResponse result is different.

请帮助我获取动态网页视图WebRequest的实际内容.

Could you please help me to get the actual content of the dynamic web page view WebRequest.

下面是我曾经阅读的代码:

Below is the code I used to read:

WebRequest request = WebRequest.Create(sURL);
request.Method = "Get";
//Get the response
WebResponse response = request.GetResponse();
//Read the stream from the response
StreamReader reader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.UTF8);

推荐答案

使用HttpWebRequest ...

// We will store the html response of the request here
string siteContent = string.Empty;

// The url you want to grab
string url = "http://google.com";

// Here we're creating our request, we haven't actually sent the request to the site yet...
// we're simply building our HTTP request to shoot off to google...
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.GZip;

// Right now... this is what our HTTP Request has been built in to...
/*
    GET http://google.com/ HTTP/1.1
    Host: google.com
    Accept-Encoding: gzip
    Connection: Keep-Alive
*/


// Wrap everything that can be disposed in using blocks... 
// They dispose of objects and prevent them from lying around in memory...
using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())  // Go query google
using(Stream responseStream = response.GetResponseStream())               // Load the response stream
using(StreamReader streamReader = new StreamReader(responseStream))       // Load the stream reader to read the response
{
    siteContent = streamReader.ReadToEnd(); // Read the entire response and store it in the siteContent variable
}

// magic...
Console.WriteLine (siteContent);

这篇关于如何从ASP.NET获取网页的HTML内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆