最大化并行请求数(aiohttp) [英] Maximize number of parallel requests (aiohttp)

查看:418
本文介绍了最大化并行请求数(aiohttp)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tl; dr :如何最大化可并行发送的http请求的数量?



我正在从多个网址中获取数据使用 aiohttp 库。我正在测试其性能,并且发现该过程中某个地方存在瓶颈,一次运行更多的URL毫无帮助。



我是使用以下代码:

  import asyncio 
import aiohttp

async def fetch(url,会话):
标头= {'User-Agent':'Mozilla / 5.0(Windows NT 6.3; Win64; x64; rv:64.0)Gecko / 20100101 Firefox / 64.0'}
尝试:
与session.get异步(
url,headers = headers,
ssl = False,
timeout = aiohttp.ClientTimeout(
total = None,
sock_connect = 10 ,
sock_read = 10

)作为响应:
content =等待响应.read()
return(url,'OK',content)
例外,例如e:
print(e)
return(url,'ERROR',str(e))

异步def run(url_list):
任务= []
异步使用aiohttp.ClientSession()作为会话:
用于url_list中的url:
task = asyncio.ensure_future(fetch(url,session))
task.append(task)
响应= asyncio.gather(* tasks)
等待响应
返回响应

循环= asyncio.get_event_loop()
asyncio.set_event_loop(loop)
任务= asyncio.ensure_future(run(url_list))
loop.run_until_complete(task)
结果= task.result()。result()

使用长度不同的 url_list 运行此文件(针对




  • 为什么发生了吗?到底什么限制了速度

  • 如何检查在给定条件下可以发送的并行请求的 maximum 个数量是多少?电脑? (我的意思是确切的数字-而不是上述的反复试验得出的数字)

  • 我该怎么做增加一次处理的请求数?



我在Windows上运行。



编辑以回应评论:



这是相同的数据,其限制设置为。最后仅稍有改进,一次发送400个URL时有许多连接超时错误。我最终在我的实际数据上使用了 limit = 200



解决方案

默认情况下 aiohttp 将同时连接的数量限制为 100 。通过将默认 limit 设置为 TCPConnector 对象,由 ClientSession 使用。您可以通过创建自定义连接器并将其传递到会话来绕过它:

  connector = aiohttp.TCPConnector(limit = None)
与aiohttp.ClientSession(connector = connector)作为会话异步:
#...

请注意,但是您可能不想将此数字设置得太高:您的网络容量,CPU,RAM和目标服务器有其自身的限制,并尝试进行大量连接会导致故障增加。

b
$ b

最佳数量可能只能通过在混凝土机器上进行实验才能找到。






不相关:



没有原因。大多数异步api接受常规协程。例如,您的最后几行代码可以这样更改:

  loop = asyncio.get_event_loop()
loop .run_until_complete(run(url_list))

或者甚至只是 asyncio.run (run(url_list))文档)(如果您使用的是Python 3.7


tl;dr: how do I maximize number of http requests I can send in parallel?

I am fetching data from multiple urls with aiohttp library. I'm testing its performance and I've observed that somewhere in the process there is a bottleneck, where running more urls at once just doesn't help.

I am using this code:

import asyncio
import aiohttp

async def fetch(url, session):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
    try:
        async with session.get(
            url, headers=headers, 
            ssl = False, 
            timeout = aiohttp.ClientTimeout(
                total=None, 
                sock_connect = 10, 
                sock_read = 10
            )
        ) as response:
            content = await response.read()
            return (url, 'OK', content)
    except Exception as e:
        print(e)
        return (url, 'ERROR', str(e))

async def run(url_list):
    tasks = []
    async with aiohttp.ClientSession() as session:
        for url in url_list:
            task = asyncio.ensure_future(fetch(url, session))
            tasks.append(task)
        responses = asyncio.gather(*tasks)
        await responses
    return responses

loop = asyncio.get_event_loop()
asyncio.set_event_loop(loop)
task = asyncio.ensure_future(run(url_list))
loop.run_until_complete(task)
result = task.result().result()

Running this with url_list of varying length (tests against https://httpbin.org/delay/2) I see that adding more urls to be run at once helps only up to ~100 urls and then total time starts to grow proportionally to number of urls (or in other words, time per one url does not decrease). This suggests that something fails when trying to process these at once. In addition, with more urls in 'one batch' I am occasionally receiving connection timeout errors.

  • Why is it happening? What exactly limits the speed here?
  • How can I check what is the maximum number of parallel requests I can send on a given computer? (I mean an exact number - not approx by 'trial and error' as above)
  • What can I do to increase the number of requests processed at once?

I am runnig this on Windows.

EDIT in response to comment:

This is the same data with limit set to None. Only slight improvement in the end and there are many connection timeout errors with 400 urls sent at once. I ended up using limit = 200 on my actual data.

解决方案

By default aiohttp limits number of simultaneous connections to 100. It achieves by setting default limit to TCPConnector object that is used by ClientSession. You can bypass it by creating and passing custom connector to session:

connector = aiohttp.TCPConnector(limit=None)
async with aiohttp.ClientSession(connector=connector) as session:
    # ...

Note however that you probably don't want to set this number too high: your network capacity, CPU, RAM and target server have their own limits and try to make enormous amount of connection can lead to increasing failures.

Optimal number can probably be found only through experiments on concrete machine.


Unrelated:

You don't have to create tasks without reason. Most asyncio api accept regular coroutines. For example, your last lines of code can be altered this way:

loop = asyncio.get_event_loop()
loop.run_until_complete(run(url_list))

Or even to just asyncio.run(run(url_list)) (doc) if you're using Python 3.7

这篇关于最大化并行请求数(aiohttp)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆