HttpClient的爬行导致内存泄漏 [英] HttpClient crawling results in memory leak

查看:891
本文介绍了HttpClient的爬行导致内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我工作的一个WebCrawler的实施但我面对的ASP.NET Web API的HttpClient的一个奇怪的内存泄漏。

I am working on a WebCrawler implementation but am facing a strange memory leak in ASP.NET Web API's HttpClient.

于是砍掉版本是在这里:

So the cut down version is here:

我发现这个问题,这是不是HttpClient的是漏水。见我的答案。

I found the problem and it is not HttpClient that is leaking. See my answer.

我添加了一个没有效果处理:

I have added dispose with no effect:

    static void Main(string[] args)
    {
        int waiting = 0;
        const int MaxWaiting = 100;
        var httpClient = new HttpClient();
        foreach (var link in File.ReadAllLines("links.txt"))
        {

            while (waiting>=MaxWaiting)
            {
                Thread.Sleep(1000);
                Console.WriteLine("Waiting ...");
            }
            httpClient.GetAsync(link)
                .ContinueWith(t =>
                                  {
                                      try
                                      {
                                          var httpResponseMessage = t.Result;
                                          if (httpResponseMessage.IsSuccessStatusCode)
                                              httpResponseMessage.Content.LoadIntoBufferAsync()
                                                  .ContinueWith(t2=>
                                                                    {
                                                                        if(t2.IsFaulted)
                                                                        {
                                                                            httpResponseMessage.Dispose();
                                                                            Console.ForegroundColor = ConsoleColor.Magenta;
                                                                            Console.WriteLine(t2.Exception);
                                                                        }
                                                                        else
                                                                        {
                                                                            httpResponseMessage.Content.
                                                                                ReadAsStringAsync()
                                                                                .ContinueWith(t3 =>
                                                                                {
                                                                                    Interlocked.Decrement(ref waiting);

                                                                                    try
                                                                                    {
                                                                                        Console.ForegroundColor = ConsoleColor.White;

                                                                                        Console.WriteLine(httpResponseMessage.RequestMessage.RequestUri);
                                                                                        string s =
                                                                                            t3.Result;

                                                                                    }
                                                                                    catch (Exception ex3)
                                                                                    {
                                                                                        Console.ForegroundColor = ConsoleColor.Yellow;

                                                                                        Console.WriteLine(ex3);
                                                                                    }
                                                                                    httpResponseMessage.Dispose();
                                                                                });                                                                                
                                                                        }
                                                                    }
                                                  );
                                      }
                                      catch(Exception e)
                                      {
                                          Interlocked.Decrement(ref waiting);
                                          Console.ForegroundColor = ConsoleColor.Red;                                             
                                          Console.WriteLine(e);
                                      }
                                  }
                );

            Interlocked.Increment(ref waiting);

        }

        Console.Read();
    }

包含链接的文件,请这里

这导致了存储器的不断升高。内存分析显示了可能的AsyncCallback举行的字节数。我已经做了很多的内存泄漏分析前,但是这一次似乎是HttpClient的水平。

This results in constant rising of the memory. Memory analysis shows many bytes held possibly by the AsyncCallback. I have done many memory leak analysis before but this one seems to be at the HttpClient level.

我使用C#4.0所以没有异步/这里等候所以只有TPL 4.0使用。

I am using C# 4.0 so no async/await here so only TPL 4.0 is used.

在code以上的作品,但不是最优化的,有时抛出发脾气尚未足以重现效果。关键是我无法找到,可能导致内存被泄露的任何一点。

The code above works but is not optimised and sometimes throws tantrum yet is enough to reproduce the effect. Point is I cannot find any point that could cause memory to be leaked.

推荐答案

OK,我得到了这条底线。由于@Tugberk,@Darrel和@youssef查找有关此花时间。

OK, I got to the bottom of this. Thanks to @Tugberk, @Darrel and @youssef for spending time on this.

基本上最初的问题,我产卵太多的任务。这开始让我不得不削减这一点,对确保并发任务的数量是有限的一些国家采取收费。 这基本上是用于编写必须使用第三方物流来安排任务的流程的一大挑战。我们可以控制线程的线程池,但是我们还需要控制我们正在创造所以没有异步/的await 的水平将帮助这个任务。

Basically the initial problem was I was spawning too many tasks. This started to take its toll so I had to cut back on this and have some state for making sure the number of concurrent tasks are limited. This is basically a big challenge for writing processes that have to use TPL to schedule the tasks. We can control threads in the thread pool but we also need to control the tasks we are creating so no level of async/await will help this.

我设法重现泄漏只有一对夫妇与此code倍 - 其他时间越来越它只是突然下降之后。我知道有GC的改造4.5所以也许这里的问题是GC没有打进足够的,虽然我已在GC代0,1和2集合看着PERF计数器。

I managed to reproduce the leak only a couple of times with this code - other times after growing it would just suddenly drop. I know that there was a revamp of GC in 4.5 so perhaps the issue here is that GC did not kick in enough although I have been looking at perf counters on GC generation 0, 1 and 2 collections.

这篇关于HttpClient的爬行导致内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆