HttpWebRequest无法从nasdaq.com下载数据,但可以从浏览器下载 [英] HttpWebRequest Unable to download data from nasdaq.com but able from browsers

查看:69
本文介绍了HttpWebRequest无法从nasdaq.com下载数据,但可以从浏览器下载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试下载此网站的CSV文件,该文件很小,仅需2秒即可通过任何浏览器下载.

I am trying to download this website csv file, the file small only take like 2 seconds to download with any browsers.

http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download

使用HttpWebRequest和WebClient,但看起来nasdaq.com不允许这两种方法使数据流过,我也尝试了Fiddler,但没有任何结果.我只能使用任何浏览器下载此数据.

using HttpWebRequest and also WebClient but looks like nasdaq.com is not letting the data to flow through with these two methods, I also tried with Fiddler and nothing coming back. I only can download this data using any browsers.

我试图更改标头,代理,安全协议,重定向,关于cookie的一些设置和许多设置,但我仍然对这个问题感到困惑.

I tried to change the header, the agent, security protocol, redirect, a little on cookie and many settings but I'm still stuck with this problem.

如果有人对如何使其工作有任何想法,请告诉我,如果您有解决方案,请仅回复此帖子.谢谢.

If anyone has any ideas on how to make it work please let me know, please only reply to this post if you have a solution. Thank you.

以下C#.Net Framework 4.5+中的代码

Code below in in C# .Net Framework 4.5+

下面的代码可以下载其他网站,但不能下载nasdaq.com网站.

The code below can download other websites but not the nasdaq.com website.

    static void Main(string[] args)
    {
        try
        {
            string testUrl = "https://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=AMEX&render=download";
          HttpWebRequestTestDownload(testUrl);

        }catch(Exception ex)
        {

            Console.WriteLine(ex.Message);
        }
    }

    public static void HttpWebRequestTestDownload(string address)
    {
        //Example from 
        //https://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.getresponse(v=vs.110).aspx

        System.Net.HttpWebRequest wReq = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(address);
        wReq.KeepAlive = false;

        System.Net.ServicePointManager.SecurityProtocol = System.Net.SecurityProtocolType.Ssl3;
        ServicePointManager.Expect100Continue = true;
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
        ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };

        //I also tried the below and still not working

        //wReq.AllowAutoRedirect = true;
        //wReq.KeepAlive = false;
        //wReq.Timeout = 10 * 60 * 1000;//10 minutes


        ////Accept-Encoding
        //wReq.Accept = "application/csv,application/json,text/csv,text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
        ////Request format text/html. Will improve this if nessary Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
        ////http://www.useragentstring.com/ 
        //wReq.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36";
        //wReq.ProtocolVersion = HttpVersion.Version11;
        //// wReq.Headers.Add("Accept-Language", "en_eg");
        //wReq.ServicePoint.Expect100Continue = false;
        ////Fixing invalid SSL problem
        //System.Net.ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
        ////Fixing  the underlying connection was closed: An unexpected error occurred on a send for Framework 4.5 or higher
        //ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3 | SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12;
        //wReq.Headers.Add("Accept-Encoding", "gzip, deflate");//Accept encoding



        // Set some reasonable limits on resources used by this request
        wReq.MaximumAutomaticRedirections = 4;
        wReq.MaximumResponseHeadersLength = 4;
        // Set credentials to use for this request.
        wReq.Credentials = System.Net.CredentialCache.DefaultCredentials;
        System.Net.HttpWebResponse response = (System.Net.HttpWebResponse)wReq.GetResponse();

        Console.WriteLine("Content length is {0}", response.ContentLength);
        Console.WriteLine("Content type is {0}", response.ContentType);

        // Get the stream associated with the response.
        System.IO.Stream receiveStream = response.GetResponseStream();

        // Pipes the stream to a higher level stream reader with the required encoding format. 
        System.IO.StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF8);
        Console.WriteLine("Response stream received.");
        Console.WriteLine(readStream.ReadToEnd());
        response.Close();
        readStream.Close();

    }

    public static void WebClientTestDownload(string address)
    {
        System.Net.WebClient client = new System.Net.WebClient();
        string reply = client.DownloadString(address);
    }

推荐答案

我能够解决此问题. 给大家的提示,使用提琴手捕获网络并使用相同的标头.在我拥有此网站所需的所有标题后,它才能工作.

I was able to resolve the problem. Tips for everyone, use fiddler to capture the network and use the same header. It works after i have all of headers required by this website.

using (WebClient web = new WebClient())
{
     web.Headers[HttpRequestHeader.Host] = "www.nasdaq.com"
     web.Headers[HttpRequestHeader.Accept] = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8";
     web.Headers[HttpRequestHeader.AcceptEncoding] = "gzip, deflate";
     web.Headers[HttpRequestHeader.UserAgent] = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Mobile Safari/537.36";
     string reply = web.DownloadString(url).;
}

这篇关于HttpWebRequest无法从nasdaq.com下载数据,但可以从浏览器下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆