为什么我的WebClient在大多数情况下会返回404错误,但并非总是如此? [英] Why does my WebClient return a 404 error most of the time, but not always?

查看:157
本文介绍了为什么我的WebClient在大多数情况下会返回404错误,但并非总是如此?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在程序中获取有关Microsoft Update的信息.但是,服务器在大约80%的时间返回404错误.我将有问题的代码简化为该控制台应用程序:

I want to get information about a Microsoft Update in my program. However, the server returns a 404 error at about 80 % of the time. I boiled the problematic code down to this console application:

using System;
using System.Net;

namespace WebBug
{
    class Program
    {
        static void Main(string[] args)
        {
            while (true)
            {
                try
                {
                    WebClient client = new WebClient();
                    Console.WriteLine(client.DownloadString("https://support.microsoft.com/api/content/kb/3068708"));
                }
                catch (Exception ex)
                {
                    Console.WriteLine(ex.Message);
                }
                Console.ReadKey();
            }
        }
    }
}

运行代码时,我必须经过几次循环,直到获得实际响应:

When I run the code, I have to get through the loop a few times until I get an actual response:

远程服务器返回错误:(404)未找到.
远程服务器返回错误:(404)未找到.
远程服务器返回错误:(404)未找到.
< div kb-title title =有关客户体验和诊断遥测的更新[...]

The remote server returned an error: (404) Not found.
The remote server returned an error: (404) Not found.
The remote server returned an error: (404) Not found.
<div kb-title title="Update for customer experience and diagnostic telemetry [...]

我可以根据需要多次打开并强制刷新(Ctrl + F5)浏览器中的链接,但它会很好显示.

I can open and force refresh (Ctrl + F5) the link in my browser as often as I want to, but it'll show fine.

问题发生在具有两个不同Internet连接的两台不同计算机上.
我还使用Html Agility Pack测试了这种情况,但结果相同.
其他网站不会不会发生此问题. (根https://support.microsoft.com可以在100%的时间内正常工作)

The problem occurs on two different machines with two different internet connections.
I've also tested this case using the Html Agility Pack, but with the same result.
The problem does not occur with other websites. (The root https://support.microsoft.com works fine 100 % of the time)

为什么我会得到这个奇怪的结果?

Why do I get this weird result?

推荐答案

Cookie.是因为有cookie.

Cookies. It's because of cookies.

当我开始研究这个问题时,我注意到我第一次在新的浏览器中打开该网站时,我得到了404,但是刷新后(有时一次,有时几次),该网站仍然可以正常工作.

As I started to dig into this problem I noticed that the first time I opened the site in a new browser I got a 404, but after refreshing (sometimes once, sometimes a few times) the site continued to work.

那是我淘汰了Chrome的隐身模式和开发人员工具的时候.

That's when I busted out Chrome's Incognito mode and the developer tools.

网络上没有什么问题:如果您加载了http,则可以简单地重定向到https版本.

There wasn't anything too fishy with the network: there was a simple redirect to the https version if you loaded http.

但我确实注意到Cookie发生了变化.这是我第一次加载页面时看到的:

But what I did notice was the cookies changed. This is what I see the first time I loaded the page:

,这是刷新(或几次)后的页面:

and here's the page after a (or a few) refreshes:

请注意如何添加更多cookie条目?该站点必须尝试读取这些内容,而不是查找它们,并阻止"您.我不确定这可能是机器人防护设备或错误的编程.

Notice how a few more cookie entries got added? The site must be trying to read those, not finding them, and "blocking" you. This might be a bot-prevention device or bad programming, I'm not sure.

无论如何,这是使代码正常工作的方法.此示例使用HttpWebRequest/Response,而不是WebClient.

Anyways, here's how to make your code work. This example uses the HttpWebRequest/Response, not WebClient.

string url = "https://support.microsoft.com/api/content/kb/3068708";

//this holds all the cookies we need to add
//notice the values match the ones in the screenshot above
CookieContainer cookieJar = new CookieContainer();
cookieJar.Add(new Cookie("SMCsiteDir", "ltr", "/", ".support.microsoft.com"));
cookieJar.Add(new Cookie("SMCsiteLang", "en-US", "/", ".support.microsoft.com"));
cookieJar.Add(new Cookie("smc_f", "upr", "/", ".support.microsoft.com"));
cookieJar.Add(new Cookie("smcexpsessionticket", "100", "/", ".microsoft.com"));
cookieJar.Add(new Cookie("smcexpticket", "100", "/", ".microsoft.com"));
cookieJar.Add(new Cookie("smcflighting", "wwp", "/", ".microsoft.com"));

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
//attach the cookie container
request.CookieContainer = cookieJar;

//and now go to the internet, fetching back the contents
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using(StreamReader sr = new StreamReader(response.GetResponseStream()))
{
    string site = sr.ReadToEnd();
}

如果删除request.CookieContainer = cookieJar;,它将失败并显示404,从而重现您的问题.

If you remove the request.CookieContainer = cookieJar;, it will fail with a 404, which reproduces your issue.

该代码示例的大部分日常工作都来自这篇文章

Most of the legwork for the code example came from this post and this post.

这篇关于为什么我的WebClient在大多数情况下会返回404错误,但并非总是如此?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆