多线程WebClient请求返回错误-System.Net.WebException [英] Multithreaded WebClient requests return error - System.Net.WebException

查看:215
本文介绍了多线程WebClient请求返回错误-System.Net.WebException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要使用WebClient下载5000多个页面.由于我希望尽快完成该任务,因此尝试使用多线程(在本例中为BlockingCollection),但是该程序似乎总是在一段时间后崩溃,并显示错误-"System.Net.WebException".如果我添加一些Thread.Sleep(3000)延迟,则会减慢我的下载过程,并在再过一段时间后返回错误.

I have 5000+ pages I want to download using WebClient. Since I want that done as fast as possible I am trying to use multithreading (using BlockingCollection in my case), but the program always seems to be crashing after a while with error - "System.Net.WebException". If I add some Thread.Sleep(3000) delay it slows down my download process and it returns the error after a little more time.

下载一页通常需要2-3秒的时间.

It usually takes about 2-3 seconds to download one page.

通常,我会认为我的BlockingCollection有问题,但是在其他任务上也可以正常工作,因此我很确定我的WebClient请求肯定有问题.我认为单独的WebClients之间可能存在某种重叠,但这只是猜测.

Normally, I would guess that there is a problem with my BlockingCollection, but it works fine with other tasks, so I am pretty sure that something has to be wrong with my WebClient requests. I think there might be some kind of overlapping between the separate WebClients, but that's just guessing.

        Multithreading multiThread = new Multithreading(5); 
        for(int pageNumber = 1; pageNumber <= 5181; pageNumber++)
        {
            multiThread.EnqueueTask(new Action(() => //add task ("scrape the trader") to the multithread queue
            {
                using (WebClient client = new WebClient())
                {
                    client.DownloadFile("http://example.com/page=" + pageNumber.ToString(), @"C:\mypages\page " + pageNumber.ToString() + ".html");
                } 
            }));
            //I put the Thread.Sleep(123) delay here
        }

如果我添加较小的延迟(例如,Thread.Sleep(100))可以正常工作,但最终我会抓取Page # *whatever pageNumber's value is at the moment*,而不是像通常那样按顺序进行.

If I add a smaller delay (Thread.Sleep(100) for example) it works fine, but then I end up scraping Page # *whatever pageNumber's value is at the moment*, not in order as it usually does.

这是我的BlockingCollection(我想我是从stackoverflow获得此代码的):

Here is my BlockingCollection (I think I got this code from stackoverflow):

class Multithreading : IDisposable
{
      BlockingCollection<Action> _taskQ = new BlockingCollection<Action>();

      public Multithreading(int workerCount)
      {
        // Create and start a separate Task for each consumer:
        for (int i = 0; i < workerCount; i++)
          Task.Factory.StartNew (Consume);
      }

      public void Dispose() { _taskQ.CompleteAdding(); }

      public void EnqueueTask (Action action) { _taskQ.Add (action); }

      void Consume()
      {
        // This sequence that we’re enumerating will block when no elements
        // are available and will end when CompleteAdding is called. 
        foreach (Action action in _taskQ.GetConsumingEnumerable())
          action();     // Perform task.
      }
}

我还尝试将所有内容放入无尽的while循环中,并使用try...catch语句处理错误,但显然,它并不会立即返回错误,而是会在一段时间后(不确定何时)返回错误.

I also tried putting everything into endless while loop and handling the error using try...catch statements, but apparently it does not return the error immediately, but after a while (not sure when).

这是整个例外情况:

An exception of type 'System.Net.WebException' occurred in System.dll but was not handled in user code

Additional information: An exception occurred during a WebClient request.

推荐答案

不能保证该类是线程安全的.从MSDN:

The class is not guaranteed to be thread safe. from MSDN:

不保证任何实例成员都是线程安全的

Any instance members are not guaranteed to be thread safe

更新

对于您提出的每个请求,请使用一个HttpWebRequest.如果您向不同的网站发出大量请求,则使用WebClientHttpWebRequest都没有关系.

Use one HttpWebRequest for each request that you make. If you make a lot of requests to different web sites it doesn't matter if you use WebClient or HttpWebRequest.

如果您对同一网站进行大量请求,它的效率仍然没有看上去的低. HttpWebRequest重用连接(隐藏在引擎盖下). Microsoft使用称为服务点的名称,您可以通过访问它们HttpWebRequest.ServicePoint 属性.如果单击属性定义,则会出现ServicePoint 文档,您可以在其中微调每个网站的连接数等.

If you do a lot of requests to the same web site it is still not as inefficient as it seems. HttpWebRequest reuse connections (it's hidden underneath the hood). Microsoft uses something called service points and you can access them through the HttpWebRequest.ServicePoint property. If you click on the property definition you come to the ServicePoint documentation where you can fine tune the number of connections per web site etc.

这篇关于多线程WebClient请求返回错误-System.Net.WebException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆