为什么 asyncio 不总是使用执行程序? [英] Why doesn't asyncio always use executors?
问题描述
我必须发送很多 HTTP 请求,一旦所有请求都返回,程序就可以继续.听起来很适合 asyncio
.有点天真,我把对 requests
的调用封装在一个 async
函数中,并将它们交给 asyncio
.这不起作用.
在网上搜索后,我找到了两个解决方案:
- 使用诸如 aiohttp 之类的库,该库可与
asyncio<一起使用/code>
- 将阻塞代码包装在对
run_in_executor
的调用中
为了更好地理解这一点,我写了一个小基准.服务器端是一个flask程序,在响应请求之前等待0.1秒.
from flask import Flask导入时间app = Flask(__name__)@app.route('/')def hello_world():time.sleep(0.1)//这里有繁重的计算 :)返回你好世界!"如果 __name__ == '__main__':应用程序运行()
客户是我的标杆
导入请求从时间导入 perf_counter,睡眠# 这是对 requests.get 的基线顺序调用开始 = perf_counter()对于范围内的我(10):r = requests.get("http://127.0.0.1:5000/")停止 = perf_counter()打印(f同步花了{停止启动}秒")#1.062秒# 现在是简单的 asyncio 版本导入异步loop = asyncio.get_event_loop()异步定义 get_response():r = requests.get("http://127.0.0.1:5000/")开始 = perf_counter()loop.run_until_complete(asyncio.gather(*[get_response() for i in range(10)]))停止 = perf_counter()打印(f异步花了{停止-开始}秒")#1.049秒# 快速异步版本开始 = perf_counter()loop.run_until_complete(asyncio.gather(*[loop.run_in_executor(None, requests.get, 'http://127.0.0.1:5000/') for i in range(10)]))停止 = perf_counter()打印(f异步(执行程序)花了{停止-启动}秒")#0.122秒#最后,aiohttp导入 aiohttp异步 def get_response(session):与 session.get("http://127.0.0.1:5000/") 异步作为响应:返回等待 response.text()异步定义主():与 aiohttp.ClientSession() 异步作为会话:等待 get_response(会话)开始 = perf_counter()loop.run_until_complete(asyncio.gather(*[main() for i in range(10)]))停止 = perf_counter()打印(faiohttp 花了 {stop-start} 秒")# 0.121 秒
因此,asyncio
的直观实现不会处理阻塞 io 代码.但是如果你正确使用asyncio
,它和特殊的aiohttp
框架一样快.协程和任务 的文档并没有真正提到这一点.仅当您阅读 loop.run_in_executor(),它说:
# 文件操作(如日志)可以阻塞# 事件循环:在线程池中运行它们.
我对这种行为感到惊讶.asyncio 的目的是加速阻塞 io 调用.为什么需要额外的包装器 run_in_executor
来执行此操作?
aiohttp
的整个卖点似乎是对 asyncio
的支持.但就我所见,requests
模块工作得很好——只要你把它包装在一个执行器中.是否有理由避免在 executor 中包装某些东西?
但据我所知,请求模块运行良好 - 只要当您将其包装在执行程序中时.是否有理由避免包装执行器中的某些东西?
在执行器中运行代码意味着在操作系统线程中运行.
aiohttp
和类似的库允许在没有操作系统线程的情况下运行非阻塞代码,仅使用协程.
如果你没有太多工作,操作系统线程和协程之间的区别并不显着,特别是与瓶颈 - I/O 操作相比.但是一旦您完成了大量工作,您就会注意到操作系统线程的性能相对较差,因为上下文切换.
比如我把你的代码改成time.sleep(0.001)
和range(100)
,我的机器显示:
异步(执行器)耗时 0.21461606299999997 秒aiohttp 耗时 0.12484742700000007 秒
而且这种差异只会随着请求数量的增加而增加.
<块引用>asyncio 的目的是加速阻塞 io 调用.
不,asyncio
的目的是提供方便的方式来控制执行流程.asyncio
允许您选择流程的工作方式 - 基于协程和操作系统线程(当您使用执行程序时)或基于纯协程(如 aiohttp
所做的).
aiohttp
的目的是加快速度,并处理如上所示的任务:)
I have to send a lot of HTTP requests, once all of them have returned, the program can continue. Sounds like a perfect match for asyncio
. A bit naively, I wrapped my calls to requests
in an async
function and gave them to asyncio
. This doesn't work.
After searching online, I found two solutions:
- use a library like aiohttp, which is made to work with
asyncio
- wrap the blocking code in a call to
run_in_executor
To understand this better, I wrote a small benchmark. The server-side is a flask program that waits 0.1 seconds before answering a request.
from flask import Flask
import time
app = Flask(__name__)
@app.route('/')
def hello_world():
time.sleep(0.1) // heavy calculations here :)
return 'Hello World!'
if __name__ == '__main__':
app.run()
The client is my benchmark
import requests
from time import perf_counter, sleep
# this is the baseline, sequential calls to requests.get
start = perf_counter()
for i in range(10):
r = requests.get("http://127.0.0.1:5000/")
stop = perf_counter()
print(f"synchronous took {stop-start} seconds") # 1.062 secs
# now the naive asyncio version
import asyncio
loop = asyncio.get_event_loop()
async def get_response():
r = requests.get("http://127.0.0.1:5000/")
start = perf_counter()
loop.run_until_complete(asyncio.gather(*[get_response() for i in range(10)]))
stop = perf_counter()
print(f"asynchronous took {stop-start} seconds") # 1.049 secs
# the fast asyncio version
start = perf_counter()
loop.run_until_complete(asyncio.gather(
*[loop.run_in_executor(None, requests.get, 'http://127.0.0.1:5000/') for i in range(10)]))
stop = perf_counter()
print(f"asynchronous (executor) took {stop-start} seconds") # 0.122 secs
#finally, aiohttp
import aiohttp
async def get_response(session):
async with session.get("http://127.0.0.1:5000/") as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
await get_response(session)
start = perf_counter()
loop.run_until_complete(asyncio.gather(*[main() for i in range(10)]))
stop = perf_counter()
print(f"aiohttp took {stop-start} seconds") # 0.121 secs
So, an intuitive implementation with asyncio
doesn't deal with blocking io code. But if you use asyncio
correctly, it is just as fast as the special aiohttp
framework. The docs for coroutines and tasks don't really mention this. Only if you read up on the loop.run_in_executor(), it says:
# File operations (such as logging) can block the # event loop: run them in a thread pool.
I was surprised by this behaviour. The purpose of asyncio is to speed up blocking io calls. Why is an additional wrapper, run_in_executor
, necessary to do this?
The whole selling point of aiohttp
seems to be support for asyncio
. But as far as I can see, the requests
module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?
But as far as I can see, the requests module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?
Running code in executor means to run it in OS threads.
aiohttp
and similar libraries allow to run non-blocking code without OS threads, using coroutines only.
If you don't have much work, difference between OS threads and coroutines is not significant especially comparing to bottleneck - I/O operations. But once you have much work you can notice that OS threads perform relatively worse due to expensively context switching.
For example, when I change your code to time.sleep(0.001)
and range(100)
, my machine shows:
asynchronous (executor) took 0.21461606299999997 seconds
aiohttp took 0.12484742700000007 seconds
And this difference will only increase according to number of requests.
The purpose of asyncio is to speed up blocking io calls.
Nope, purpose of asyncio
is to provide convenient way to control execution flow. asyncio
allows you to choose how flow works - based on coroutines and OS threads (when you use executor) or on pure coroutines (like aiohttp
does).
It's aiohttp
's purpose to speed up things and it copes with the task as shown above :)
这篇关于为什么 asyncio 不总是使用执行程序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!