使用C#HttpClient提高异步发布的性能 [英] Improve performance of Async Post using C# HttpClient

查看:1078
本文介绍了使用C#HttpClient提高异步发布的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试寻找进一步提高控制台应用程序性能的方法(已经完全正常运行).

I am trying to find way to further improve the performance of my console app (already fully working).

我有一个CSV文件,其中包含地址列表(约100k). 我需要查询一个Web API,其POST响应将是此类地址的地理坐标.然后,我将使用丰富了地理坐标(纬度和经度)的地址数据,将GeoJSON文件写入文件系统.

I have a CSV file which contains a list of addresses (about 100k). I need to query a Web API whose POST response would be the geographical coordinates of such addresses. Then I am going to write a GeoJSON file to the file system with the address data enriched with geographical coordinates (latitude and longitude).

我当前的解决方案将数据分为1000条记录,并使用HttpClient(带有控制台应用程序的.NET core 3.1和使用.NET Standard 2.0的类库)将异步POST请求发送到Web API. GeoJSON是我的DTO类.

My current solution splits the data into batches of 1000 records and sends Async POST requests to the Web API using HttpClient (.NET core 3.1 with console app and class library using .NET Standard 2.0). GeoJSON is my DTO class.

public class GeoJSON
    {
        public string Locality { get; set; }
        public string Street { get; set; }
        public string StreetNumber { get; set; }
        public string ZIP { get; set; }
        public string Latitude { get; set; }
        public string Longitude { get; set; }
    }


public static async Task<List<GeoJSON>> GetAddressesInParallel(List<GeoJSON> geos)
        {
            //calculating number of batches based on my batchsize (1000)
            int numberOfBatches = (int)Math.Ceiling((double)geos.Count() / batchSize);

            for (int i = 0; i < numberOfBatches; i++)
            {
                var currentIds = geos.Skip(i * batchSize).Take(batchSize);
                var tasks = currentIds.Select(id => SendPOSTAsync(id));
                geoJSONs.AddRange(await Task.WhenAll(tasks));
            }

            return geoJSONs;
        }

我的异步POST方法如下:

My Async POST method looks like this:

 public static async Task<GeoJSON> SendPOSTAsync(GeoJSON geo)
        {
            string payload = JsonConvert.SerializeObject(geo);
            HttpContent c = new StringContent(payload, Encoding.UTF8, "application/json");
            using HttpResponseMessage response = await client.PostAsync(URL, c).ConfigureAwait(false);

            if (response.IsSuccessStatusCode)
            {
                var address = JsonConvert.DeserializeObject<GeoJSON>(await response.Content.ReadAsStringAsync());
                geo.Latitude = address.Latitude;
                geo.Longitude = address.Longitude;
            }
            return geo;
        }

Web API作为自托管x86应用程序在我的本地计算机上运行. 整个应用程序将在不到30秒的时间内结束. 最耗时的部分是异步POST部分(约25秒). Web API的每个帖子仅使用一个地址,否则我将在一个请求中发送多个地址.

The Web API runs on my local machine as Self Hosted x86 application. The whole application ends in less than 30s. The most time consuming part is the Async POST part (about 25s). The Web API takes only one address for each post, otherwise I'd have sent multiple addresses in one request.

关于如何提高针对Web API的请求性能的任何想法?

Any ideas on how to improve performance of the request against the Web API?

推荐答案

批处理方法的潜在问题是,单个延迟的响应可能会延迟整个批处理的完成.这可能不是一个实际的问题,因为您正在调用的Web服务可能具有非常一致的响应时间,但是在任何情况下,您都可以尝试一种替代方法,该方法允许在不使用批处理的情况下控制并发.下面的示例使用 TPL数据流库,该库内置于.NET Core平台中,可以作为 .NET Framework的包:

A potential problem of your batching approach is that a single delayed response may delay the completion of a whole batch. It may not be an actual problem because the web service you are calling may have very consistent response times, but in any case you could try an alternative approach that allows controlling the concurrency without the use of batching. The example bellow uses the TPL Dataflow library, which is built-in the .NET Core platform and available as a package for .NET Framework:

public static async Task<List<GeoJSON>> GetAddressesInParallel(List<GeoJSON> geos)
{
    var block = new ActionBlock<GeoJSON>(async item =>
    {
        await SendPOSTAsync(item);
    }, new ExecutionDataflowBlockOptions()
    {
        MaxDegreeOfParallelism = 1000
    });

    foreach (var item in geos)
    {
        await block.SendAsync(item);
    }
    block.Complete();

    await block.Completion;
    return geos;
}

您的SendPOSTAsync方法仅返回作为参数接收的同一GeoJSON,因此GetAddressesInParallel也可以返回与作为参数接收的同一List<GeoJSON>.

Your SendPOSTAsync method just returns the same GeoJSON that receives as argument, so the GetAddressesInParallel can also return the same List<GeoJSON> that receives as argument.

ActionBlock 是库中最简单的块.它仅对每个项目执行同步或异步操作,从而允许MaxDegreeOfParallelism的配置以及其他选项.您也可以尝试将工作流分成多个块,然后将它们链接在一起以形成管道.例如:

The ActionBlock is the simplest of the blocks available in the library. It just executes a sync or async action for every item, allowing the configuration of the MaxDegreeOfParallelism among other options. You could also try splitting your workflow in multiple blocks, and then link them together to form a pipeline. For example:

    GeoJSON对象序列化为JSON的
  1. TransformBlock<GeoJSON, (GeoJSON, string)>.
  2. 发出HTTP请求的
  3. TransformBlock<(GeoJSON, string), (GeoJSON, string)>.
  4. ActionBlock<(GeoJSON, string)>会反序列化HTTP响应,并使用接收到的值更新GeoJSON对象.
  1. TransformBlock<GeoJSON, (GeoJSON, string)> that serializes the GeoJSON objects to JSON.
  2. TransformBlock<(GeoJSON, string), (GeoJSON, string)> that makes the HTTP requests.
  3. ActionBlock<(GeoJSON, string)> that deserializes the HTTP responses and updates the GeoJSON objects with the received values.

这样的安排将使您可以微调每个块的MaxDegreeOfParallelism,并希望获得最佳性能.

Such an arrangement would allow you to fine-tune the MaxDegreeOfParallelism of each block, and hopefully achieve the optimal performance.

这篇关于使用C#HttpClient提高异步发布的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆