使用aiohttp / asyncio发出1百万个请求-从字面上看 [英] Making 1 milion requests with aiohttp/asyncio - literally
问题描述
我遵循了本教程: https: //pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html ,当我处理5万个请求时,一切正常。但是我需要进行1百万次API调用,然后此代码出现问题:
I followed up this tutorial: https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html and everything works fine when I am doing like 50 000 requests. But I need to do 1 milion API calls and then I have problem with this code:
url = "http://some_url.com/?id={}"
tasks = set()
sem = asyncio.Semaphore(MAX_SIM_CONNS)
for i in range(1, LAST_ID + 1):
task = asyncio.ensure_future(bound_fetch(sem, url.format(i)))
tasks.add(task)
responses = asyncio.gather(*tasks)
return await responses
由于Python需要创建1百万个任务,因此它基本上只是滞后在终端中显示 Killed
消息。有什么方法可以使用由预设的url组(或列表)插入的生成器吗?谢谢。
Because Python needs to create 1 milion tasks, it basically just lags and then prints Killed
message in terminal. Is there any way to use a generator insted of pre-made set (or list) of urls? Thanks.
推荐答案
一次安排所有一百万个任务
这是您正在谈论的代码。它最多需要3 GB的RAM,因此如果您的可用内存较少,很可能会被操作系统终止。
Schedule all 1 million tasks at once
This is the code you are talking about. It takes up to 3 GB RAM so it is easily possible that it will be terminated by the operating system if you have low free memory.
import asyncio
from aiohttp import ClientSession
MAX_SIM_CONNS = 50
LAST_ID = 10**6
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def bound_fetch(sem, url, session):
async with sem:
await fetch(url, session)
async def fetch_all():
url = "http://localhost:8080/?id={}"
tasks = set()
async with ClientSession() as session:
sem = asyncio.Semaphore(MAX_SIM_CONNS)
for i in range(1, LAST_ID + 1):
task = asyncio.create_task(bound_fetch(sem, url.format(i), session))
tasks.add(task)
return await asyncio.gather(*tasks)
if __name__ == '__main__':
asyncio.run(fetch_all())
使用队列来简化工作
这是我的建议,如何使用 asyncio.Queue 将URL传递给工作人员任务。队列按需填充,没有预制的URL列表。
Use queue to streamline the work
This is my suggestion how to use asyncio.Queue to pass URLs to worker tasks. The queue is filled as-needed, there is no pre-made list of URLs.
仅需30 MB RAM:)
It takes only 30 MB RAM :)
import asyncio
from aiohttp import ClientSession
MAX_SIM_CONNS = 50
LAST_ID = 10**6
async def fetch(url, session):
async with session.get(url) as response:
return await response.read()
async def fetch_worker(url_queue):
async with ClientSession() as session:
while True:
url = await url_queue.get()
try:
if url is None:
# all work is done
return
response = await fetch(url, session)
# ...do something with the response
finally:
url_queue.task_done()
# calling task_done() is necessary for the url_queue.join() to work correctly
async def fetch_all():
url = "http://localhost:8080/?id={}"
url_queue = asyncio.Queue(maxsize=100)
worker_tasks = []
for i in range(MAX_SIM_CONNS):
wt = asyncio.create_task(fetch_worker(url_queue))
worker_tasks.append(wt)
for i in range(1, LAST_ID + 1):
await url_queue.put(url.format(i))
for i in range(MAX_SIM_CONNS):
# tell the workers that the work is done
await url_queue.put(None)
await url_queue.join()
await asyncio.gather(*worker_tasks)
if __name__ == '__main__':
asyncio.run(fetch_all())
这篇关于使用aiohttp / asyncio发出1百万个请求-从字面上看的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!