C#中的并行任务性能 [英] Parallel tasks performance in c#

查看:51
本文介绍了C#中的并行任务性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要让任务运行得更快,我尝试使用信号量、并行库和线程(试图为每个工作打开一个,我知道这是最愚蠢的事情),但它们都没有显示出我需要的性能.我不熟悉线程的工作,我需要一些帮助来找到正确的方法并了解任务和线程的工作原理.

I need to make Tasks run faster, I tried to use semaphore, parallel library and threads(tried to open one for every work, I know its the most dumb thing to do), but none of them show the performance I need. I'm not familiar to work with thread stuff and I need some help to find the right way and understand how Task and Threads work.

功能如下:

 public class Test
    {
        public void openThreads()
        {
            int maxConcurrency = 500;
            var someWork = get_data_from_database();
            using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
            {
                List<Task> tasks = new List<Task>();
                foreach (var work in someWork)
                {
                    concurrencySemaphore.Wait();

                    var t = Task.Factory.StartNew(() =>
                    {
                        try
                        {
                            ScrapThings(work);
                        }
                        finally
                        {
                            concurrencySemaphore.Release();
                        }
                    });

                    tasks.Add(t);
                }

                Task.WaitAll(tasks.ToArray());
            }
        }

        public async Task ScrapThings(Object work)
        {
            HttpClient client = new HttpClient();
            Encoding utf8 = Encoding.UTF8;
            var response = client.GetAsync(work.url).Result;
            var buffer = response.Content.ReadAsByteArrayAsync().Result;
            string content = utf8.GetString(buffer);
            /*
             Do some parse operations, load html document, get xpath, split things, etc 
             */

            while(true) // this loop runs from 1~15 times
            {
                response = client.GetAsync(work.anotherUrl).Result;
                buffer = response.Content.ReadAsByteArrayAsync().Result;
                content = utf8.GetString(buffer);
                if (content == "OK")
                    break;

                await Task.Delay(10000); //I need some throttle here before it tries again
            }
            /*
                Do some parse operations, load html document, get xpath, split things, etc 
                */
            update_things_in_database();
        }
    }

我想让这个任务并行运行 500 次,所有操作需要 18 小时才能完成,我需要减少这个时间,我使用的是 32 核/64 线程的至强.我尝试打开 500 个线程(与信号量和并行库相比性能更好),但感觉这不是正确的做法.

I want to make this task run 500 times in parallel, all the operation takes 18 hours to complete and I need to decrease this, I'm using xeon with 32 cores/64 threads. I tried to open 500 threads (better performance comparing to semaphore and parallel library) but it doesnt feel the right way to do.

推荐答案

我认为性能问题不在于您如何运行线程,而在于单个线程的执行方式.根据您使用的 .NET/库版本,可能出现的问题很少.

I would say problem with performance is not with how you run your threads, but how individual threads are performing. Depended on version of .NET/libraries you are using there are few possible issues.

  1. 你应该重用 HttpClient 实例,原因解释了 此处 例如.
  2. 如果 work.urlwork.anotherUrl 使用相同的域子集,您应该查看每个端点的连接限制(以及总数).取决于版本 HttpClientHandler.MaxConnectionsPerServerServicePoint.ConnectionLimitServicePointManager.DefaultConnectionLimit .前者用于 .NET Core,后者用于 .NET 完整框架.
  1. You should reuse HttpClient instances, for reasons explained here for example.
  2. If work.url and work.anotherUrl use the same subset of domains you should look into connection limit per endpoint (and total also). Depended on version either HttpClientHandler.MaxConnectionsPerServer or ServicePoint.ConnectionLimit and ServicePointManager.DefaultConnectionLimit . The former one is for .NET Core and latter for .NET Full framework.

解决第一个问题的推荐方法是使用 IHttpClientFactory

The recommended approach to solve the first issue is to use IHttpClientFactory

还有一些信息.

UPD

您在评论中提到您使用的是 .NET 4.7.2,所以我建议从向您的应用程序添加下一行开始(在开始的某处):

You mentioned in comments that you are using .NET 4.7.2 so I would suggest to start with adding next lines to your application (somewhere at the start):

ServicePointManager.DefaultConnectionLimit = 500;
// if you can get collection of most scrapped ones:
var domains = new [] { "http://slowwly.robertomurray.co.uk" };
foreach(var d in domains)
{
    var delayServicePoint = ServicePointManager.FindServicePoint(new Uri(d));
    delayServicePoint.ConnectionLimit = 10; // or bigger
}

这篇关于C#中的并行任务性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆