无法通过C#webclient和请求/响应下载网页 [英] Can't download webpage via C# webclient and via request/respond

查看:63
本文介绍了无法通过C#webclient和请求/响应下载网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想下载网页的html代码,但是几个链接有问题.例如: http://www.business-top.info/

I want to download webpages html code, but have problems with several links. For example: http://www.business-top.info/, http://azerizv.az/ I recieve no html at all using this: 1. WebClient:

using (var client = new WebClient())
            {
                client.Encoding = System.Text.Encoding.UTF8;
                string result = client.DownloadString(resultUrl);
                Console.WriteLine(result);
                Console.ReadLine();
            }

2.Http请求/响应

2. Http request/response

var request = (HttpWebRequest)WebRequest.Create(resultUrl);
            request.Method = "POST";
            using (var response = (HttpWebResponse)request.GetResponse())
            {
                using (var stream = response.GetResponseStream())
                {
                    StreamReader sr = new StreamReader(stream, Encoding.UTF8);
                    string data = sr.ReadToEnd();
                    Console.WriteLine(data);
                    Console.ReadLine();
                }
            }

有很多这样的链接,所以我不能仅通过浏览器通过网页的源代码手动下载html

There are many such links, so I can't download html manually just via sourse code of web page via browser

推荐答案

某些页面会分阶段加载.首先,他们加载页面的核心,然后评估内部通过AJAX加载更多内容的任何JavaScript.要抓取这些页面,您不仅需要简单的HTTP请求发送者,还需要更高级的内容抓取库.

Some pages load in stages. First they load the core of the page and only then they evaluate any JavaScript inside which loads further content via AJAX. To scrape these pages you will need more advanced content scraping libraries, than just simple HTTP request sender.

这是SO中与您现在遇到的相同问题有关的一个问题:使用c#的Jquery Ajax网页抓取

Here is a question in SO about the same problem that you are having now: Jquery Ajax Web page scraping using c#

这篇关于无法通过C#webclient和请求/响应下载网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆