HttpClient爬取导致内存泄漏 [英] HttpClient crawling results in memory leak

查看:47
本文介绍了HttpClient爬取导致内存泄漏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发 WebCrawler 实现,但在 ASP.NET Web API 中遇到了奇怪的内存泄漏HttpClient.

I am working on a WebCrawler implementation but am facing a strange memory leak in ASP.NET Web API's HttpClient.

所以精简版在这里:

我发现了问题,不是 HttpClient 泄漏.看我的回答.

I found the problem and it is not HttpClient that is leaking. See my answer.

我添加了 dispose 没有效果:

I have added dispose with no effect:

    static void Main(string[] args)
    {
        int waiting = 0;
        const int MaxWaiting = 100;
        var httpClient = new HttpClient();
        foreach (var link in File.ReadAllLines("links.txt"))
        {

            while (waiting>=MaxWaiting)
            {
                Thread.Sleep(1000);
                Console.WriteLine("Waiting ...");
            }
            httpClient.GetAsync(link)
                .ContinueWith(t =>
                                  {
                                      try
                                      {
                                          var httpResponseMessage = t.Result;
                                          if (httpResponseMessage.IsSuccessStatusCode)
                                              httpResponseMessage.Content.LoadIntoBufferAsync()
                                                  .ContinueWith(t2=>
                                                                    {
                                                                        if(t2.IsFaulted)
                                                                        {
                                                                            httpResponseMessage.Dispose();
                                                                            Console.ForegroundColor = ConsoleColor.Magenta;
                                                                            Console.WriteLine(t2.Exception);
                                                                        }
                                                                        else
                                                                        {
                                                                            httpResponseMessage.Content.
                                                                                ReadAsStringAsync()
                                                                                .ContinueWith(t3 =>
                                                                                {
                                                                                    Interlocked.Decrement(ref waiting);

                                                                                    try
                                                                                    {
                                                                                        Console.ForegroundColor = ConsoleColor.White;

                                                                                        Console.WriteLine(httpResponseMessage.RequestMessage.RequestUri);
                                                                                        string s =
                                                                                            t3.Result;

                                                                                    }
                                                                                    catch (Exception ex3)
                                                                                    {
                                                                                        Console.ForegroundColor = ConsoleColor.Yellow;

                                                                                        Console.WriteLine(ex3);
                                                                                    }
                                                                                    httpResponseMessage.Dispose();
                                                                                });                                                                                
                                                                        }
                                                                    }
                                                  );
                                      }
                                      catch(Exception e)
                                      {
                                          Interlocked.Decrement(ref waiting);
                                          Console.ForegroundColor = ConsoleColor.Red;                                             
                                          Console.WriteLine(e);
                                      }
                                  }
                );

            Interlocked.Increment(ref waiting);

        }

        Console.Read();
    }

包含链接的文件在此处可用.

The file containing links is available here.

这导致内存不断上升.内存分析显示 AsyncCallback 可能持有许多字节.我之前做过很多内存泄漏分析,但这一次似乎是在 HttpClient 级别.

This results in constant rising of the memory. Memory analysis shows many bytes held possibly by the AsyncCallback. I have done many memory leak analysis before but this one seems to be at the HttpClient level.

我使用的是 C# 4.0,所以这里没有 async/await,所以只使用了 TPL 4.0.

I am using C# 4.0 so no async/await here so only TPL 4.0 is used.

上面的代码有效但没有优化,有时会发脾气但足以重现效果.重点是我找不到任何可能导致内存泄漏的点.

The code above works but is not optimised and sometimes throws tantrum yet is enough to reproduce the effect. Point is I cannot find any point that could cause memory to be leaked.

推荐答案

好的,我明白了.感谢 @Tugberk、@Darrel 和 @youssef 在这方面花费时间.

OK, I got to the bottom of this. Thanks to @Tugberk, @Darrel and @youssef for spending time on this.

基本上最初的问题是我生成了太多任务.这开始产生影响,所以我不得不减少这一点,并有一些状态来确保并发任务的数量是有限的.对于编写必须使用 TPL 来调度任务的流程来说,这基本上是一个很大的挑战.我们可以控制线程池中的线程,但我们也需要控制我们正在创建的任务,因此任何 async/await 级别都无济于事.

Basically the initial problem was I was spawning too many tasks. This started to take its toll so I had to cut back on this and have some state for making sure the number of concurrent tasks are limited. This is basically a big challenge for writing processes that have to use TPL to schedule the tasks. We can control threads in the thread pool but we also need to control the tasks we are creating so no level of async/await will help this.

我设法用这段代码仅重现了几次泄漏 - 其他时候在增长后它会突然下降.我知道在 4.5 中对 GC 进行了改造,所以这里的问题可能是 GC 没有足够的发挥作用,尽管我一直在查看 GC 第 0、1 和 2 代集合上的性能计数器.

I managed to reproduce the leak only a couple of times with this code - other times after growing it would just suddenly drop. I know that there was a revamp of GC in 4.5 so perhaps the issue here is that GC did not kick in enough although I have been looking at perf counters on GC generation 0, 1 and 2 collections.

这篇关于HttpClient爬取导致内存泄漏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆