处理套接字耗尽和 DNS 回收时具有多个代理的 HttpClient [英] HttpClient with multiple proxies while handling socket exhaustion and DNS recycling

查看:28
本文介绍了处理套接字耗尽和 DNS 回收时具有多个代理的 HttpClient的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在和一位朋友合作一个有趣的项目,我们必须执行数百个 HTTP 请求,所有请求都使用不同的代理.想象一下,它是这样的:

We are working on a fun project with a friend and we have to execute hundreds of HTTP requests, all using different proxies. Imagine that it is something like the following:

for (int i = 0; i < 20; i++)
{
    HttpClientHandler handler = new HttpClientHandler { Proxy = new WebProxy(randomProxy, true) };

    using (var client = new HttpClient(handler))
    {
        using (var request = new HttpRequestMessage(HttpMethod.Get, "http://x.com"))
        {
            var response = await client.SendAsync(request);

            if (response.IsSuccessStatusCode)
            {
                string content = await response.Content.ReadAsStringAsync();
            }
        }

        using (var request2 = new HttpRequestMessage(HttpMethod.Get, "http://x.com/news"))
        {
            var response = await client.SendAsync(request2);

            if (response.IsSuccessStatusCode)
            {
                string content = await response.Content.ReadAsStringAsync();
            }
        }
    }
}

顺便说一下,我们使用的是 .NET Core(目前是控制台应用程序).我知道有很多关于套接字耗尽和处理 DNS 回收的线程,但是这个特定的线程不同,因为使用了多个代理.

By the way, we are using .NET Core (Console Application for now). I know there are many threads about socket exhaustion and handling DNS recycling, but this particular one is different, because of the multiple proxy usage.

如果我们使用 HttpClient 的单例实例,就像大家建议的那样:

If we use a singleton instance of HttpClient, just like everyone suggests:

  • 我们不能设置多个代理,因为它是在 HttpClient 实例化期间设置的,之后无法更改.
  • 它不考虑 DNS 更改.重用 HttpClient 的实例意味着它会保留套接字直到它关闭,因此如果服务器上发生 DNS 记录更新,客户端将永远不会知道,直到该套接字关闭.一种解决方法是将 keep-alive 标头设置为 false,这样套接字将在每次请求后关闭.它导致次优性能.第二种方法是使用 ServicePoint:
  • We can't set more than one proxy, because it is being set during HttpClient's instantiation and cannot be changed afterwards.
  • It doesn't respect DNS changes. Re-using an instance of HttpClient means that it holds on to the socket until it is closed so if you have a DNS record update occurring on the server the client will never know until that socket is closed. One workaround is to set the keep-alive header to false, so the socket will be closed after each request. It leads to a sub-optimal performance. The second way is by using ServicePoint:
ServicePointManager.FindServicePoint("http://x.com")  
    .ConnectionLeaseTimeout = Convert.ToInt32(TimeSpan.FromSeconds(15).TotalMilliseconds);

ServicePointManager.DnsRefreshTimeout = Convert.ToInt32(TimeSpan.FromSeconds(5).TotalMilliseconds);

另一方面,处理 HttpClient(就像在我上面的示例中一样),即 HttpClient 的多个实例,会导致多个套接字处于 TIME_WAIT 状态.TIME_WAIT 表示本地端点(这边)已经关闭了连接.

On the other hand, disposing HttpClient (just like in my example above), in other words multiple instances of HttpClient, is leading to multiple sockets in TIME_WAIT state. TIME_WAIT indicates that local endpoint (this side) has closed the connection.

我知道 SocketsHttpHandlerIHttpClientFactory,但它们无法解决不同的代理.

I'm aware of SocketsHttpHandler and IHttpClientFactory, but they can't solve the different proxies.

var socketsHandler = new SocketsHttpHandler
{
    PooledConnectionLifetime = TimeSpan.FromMinutes(10),
    PooledConnectionIdleTimeout = TimeSpan.FromMinutes(5),
    MaxConnectionsPerServer = 10
};

// Cannot set a different proxy for each request
var client = new HttpClient(socketsHandler);

可以做出的最明智的决定是什么?

What is the most sensible decision that can be made?

推荐答案

首先,我想提一下 @Stephen Cleary 的示例,如果代理在编译时已知,则可以正常工作,但在我的情况下,它们在运行.我忘了在问题中提到这一点,所以这是我的错.

First of all, I want to mention that @Stephen Cleary's example works fine if the proxies are known at compile-time, but in my case they are known at runtime. I forgot to mention that in the question, so it's my fault.

感谢@aepot 指出这些内容.

Thanks to @aepot for pointing out those stuff.

这就是我想出的解决方案(感谢@mcont):

That's the solution I came up with (credits @mcont):

/// <summary>
/// A wrapper class for <see cref="FlurlClient"/>, which solves socket exhaustion and DNS recycling.
/// </summary>
public class FlurlClientManager
{
    /// <summary>
    /// Static collection, which stores the clients that are going to be reused.
    /// </summary>
    private static readonly ConcurrentDictionary<string, IFlurlClient> _clients = new ConcurrentDictionary<string, IFlurlClient>();

    /// <summary>
    /// Gets the available clients.
    /// </summary>
    /// <returns></returns>
    public ConcurrentDictionary<string, IFlurlClient> GetClients()
        => _clients;

    /// <summary>
    /// Creates a new client or gets an existing one.
    /// </summary>
    /// <param name="clientName">The client name.</param>
    /// <param name="proxy">The proxy URL.</param>
    /// <returns>The <see cref="FlurlClient"/>.</returns>
    public IFlurlClient CreateOrGetClient(string clientName, string proxy = null)
    {
        return _clients.AddOrUpdate(clientName, CreateClient(proxy), (_, client) =>
        {
            return client.IsDisposed ? CreateClient(proxy) : client;
        });
    }

    /// <summary>
    /// Disposes a client. This leaves a socket in TIME_WAIT state for 240 seconds but it's necessary in case a client has to be removed from the list.
    /// </summary>
    /// <param name="clientName">The client name.</param>
    /// <returns>Returns true if the operation is successful.</returns>
    public bool DeleteClient(string clientName)
    {
        var client = _clients[clientName];
        client.Dispose();
        return _clients.TryRemove(clientName, out _);
    }

    private IFlurlClient CreateClient(string proxy = null)
    {
        var handler = new SocketsHttpHandler()
        {
            Proxy = proxy != null ? new WebProxy(proxy, true) : null,
            PooledConnectionLifetime = TimeSpan.FromMinutes(10)
        };

        var client = new HttpClient(handler);

        return new FlurlClient(client);
    }
}

每个请求的代理意味着每个请求都有一个额外的套接字(另一个 HttpClient 实例).

A proxy per request means an additional socket for each request (another HttpClient instance).

在上面的解决方案中,ConcurrentDictionary是用来存储HttpClient的,所以我可以重用它们,这就是HttpClient的确切点.在被 API 限制阻止之前,我可以对 5 个请求使用相同的代理.我也忘了在问题中提到这一点.

In the solution above, ConcurrentDictionary is used to store the HttpClients, so I can reuse them, which is the exact point of HttpClient. I could use same proxy for 5 requests, before it gets blocked by API limitations. I forgot to mention that in the question as well.

如您所见,有两种解决套接字耗尽和 DNS 回收的解决方案:IHttpClientFactorySocketsHttpHandler.第一个不适合我的情况,因为我使用的代理在运行时是已知的,而不是在编译时.上面的解决方案使用了第二种方式.

As you've seen, there are two solutions solving socket exhaustion and DNS recycling: IHttpClientFactory and SocketsHttpHandler. The first one doesn't suit my case, because the proxies I'm using are known at runtime, not at compile-time. The solution above uses the second way.

对于那些有相同问题的人,您可以阅读 GitHub 上的以下问题.它解释了一切.

For those who have same issue, you can read the following issue on GitHub. It explains everything.

我对改进持开放态度,所以戳我.

I'm open-minded for improvements, so poke me.

这篇关于处理套接字耗尽和 DNS 回收时具有多个代理的 HttpClient的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆