Python协程中的并行异步IO [英] Parallel asynchronous IO in Python's coroutines

查看:105
本文介绍了Python协程中的并行异步IO的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简单的例子:我需要并行发出两个不相关的HTTP请求。最简单的方法是什么?我希望它像这样:

  async def do_the_job():
以aiohttp.ClientSession()作为会话:
coro_1 = session.get('http://httpbin.org/get')
coro_2 = session.get('http://httpbin.org/ip')
返回Combine_responses(await coro_1,await coro_2)

换句话说,我想启动IO操作并等待他们的结果,因此他们有效地并行运行。这可以通过 asyncio.gather

  async def do_the_job( ):
以aiohttp.ClientSession()作为会话:
coro_1 = session.get('http://example.com/get')
coro_2 = session.get('http: //example.org/tp')
return Combine_responses(*(等待asyncio.gather(coro_1,coro_2)))

接下来,我想具有一些复杂的依赖关系结构。当我具备所有先决条件时,我想开始操作,而在需要结果时,我就想得到结果。 asyncio.ensure_future 可以帮助从协程中分离出任务,协程由事件循环单独管理:

  async def do_the_job():
以aiohttp.ClientSession()作为会话:
fut_1 = asyncio.ensure_future(session.get('http://httpbin.org/ip' ))
coro_2 = session.get('http://httpbin.org/get')
coro_3 = session.post('http://httpbin.org/post',data =(await coro_2)
coro_3_result =等待coro_3
return Combine_responses(await fut_1,coro_3_result)

的确,要在我的逻辑流程中使用协同程序实现并行的非阻塞IO,我必须使用 asyncio.ensure_future asyncio。收集(实际上使用的是 asyncio.ensure_future )?有没有那么冗长的方式?



是真的吗?通常,开发人员必须考虑哪些协程应成为单独的任务并使用上述功能离子以获得最佳性能?



在事件循环中使用没有多个任务的协程是否有意义?



现实生活中的事件循环任务有多繁重?当然,它们比操作系统线程或进程轻巧。我应该在多大程度上争取尽可能少的此类任务?

解决方案


我需要做出两个并行的不相关的HTTP请求。
最简单的方法是什么?




 导入异步
导入aiohttp


异步def请求(URL):
与aiohttp.ClientSession()作为会话异步:
与session.get(url)作为异步:
return await resp.text()


async def main():
结果=等待asyncio.gather(
request('http:/ /httpbin.org/delay/1')、
请求('http://httpbin.org/delay/1')、

打印(len(results))


loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
loop.run_until_complete(loop.shutdown_asyncgens())
最后:
loop.close()

是的,您可以通过 asyncio.gather 或使用 asyncio.ensure_future 创建任务。



< blockquote>

接下来,我要具有一些复杂的依赖关系结构吗?我想在具备所有先决条件的情况下开始
的操作,并在需要结果时获取
的结果。


虽然您提供的代码可以工作,但最好将拆分的并发流拆分到不同的协程中,然后再次使用 asyncio.gather

 导入asyncio 
导入aiohttp


异步def请求(URL):
与aiohttp异步。 ClientSession()作为会话:
与session.get(url)作为异步:
返回等待resp.text()


异步def get_ip():
返回等待请求('http://httpbin.org/ip')


异步def post_from_get():
与aiohttp.ClientSession()异步会话:
与session.get('http://httpbin.org/get')异步,分别为:
get_res = await resp.text()
与session.post(' http://httpbin.org/post',data = get_res)作为resp:
返回等待resp.text()


异步def main():
结果=等待asyncio.gather(
get_ip(),
post_from_get(),

print(len(results))


loop = asyncio.get_event_loop()
try:
loop.run_until_complete(main())
循环run_until_complete(loop.shutdown_asyncgens())
最后:
loop.close()



< blockquote>

是真的吗,通常开发人员必须考虑将哪些协程
变成单独的任务,并使用上述功能来获得
最佳性能?


由于您使用了asyncio,因此您可能希望同时运行一些作业以提高性能,对吗? asyncio.gather 是一种说法-同时运行这些作业以更快地获得其结果。



如果您不必考虑应该同时运行哪些作业以获得性能,则可以使用简单的同步代码。


在事件
循环中使用协程而不执行多个任务的意义?


在您的代码中,您不必创建任务如果您不想要,请手动进行:此答案中的两个代码段均不使用 asyncio.ensure_future 。但是内部 asyncio 会不断使用任务(例如,如您所提到的 asyncio.gather 本身会使用任务)。


事件循环任务在现实生活中有多繁重?当然,它们比操作系统线程或进程轻
。我应在多大程度上争取
来减少此类任务的可能次数?


异步程序的主要瓶颈是(几乎总是)网络:您完全不必担心异步协程/任务的数量。


Simple example: I need to make two unrelated HTTP requests in parallel. What's the simplest way to do that? I expect it to be like that:

async def do_the_job():
    with aiohttp.ClientSession() as session:
        coro_1 = session.get('http://httpbin.org/get')
        coro_2 = session.get('http://httpbin.org/ip')
        return combine_responses(await coro_1, await coro_2)

In other words, I want to initiate IO operations and wait for their results so they effectively run in parallel. This can be achieved with asyncio.gather:

async def do_the_job():
    with aiohttp.ClientSession() as session:
        coro_1 = session.get('http://example.com/get')
        coro_2 = session.get('http://example.org/tp')
        return combine_responses(*(await asyncio.gather(coro_1, coro_2)))

Next, I want to have some complex dependency structure. I want to start operations when I have all prerequisites for them and get results when I need the results. Here helps asyncio.ensure_future which makes separate task from coroutine which is managed by event loop separately:

async def do_the_job():
    with aiohttp.ClientSession() as session:
        fut_1 = asyncio.ensure_future(session.get('http://httpbin.org/ip'))
        coro_2 = session.get('http://httpbin.org/get')
        coro_3 = session.post('http://httpbin.org/post', data=(await coro_2)
        coro_3_result = await coro_3
        return combine_responses(await fut_1, coro_3_result)

Is it true that, to achieve parallel non-blocking IO with coroutines in my logic flow, I have to use either asyncio.ensure_future or asyncio.gather (which actually uses asyncio.ensure_future)? Is there a less "verbose" way?

Is it true that normally developers have to think what coroutines should become separate tasks and use aforementioned functions to gain optimal performance?

Is there a point in using coroutines without multiple tasks in event loop?

How "heavy" are event loop tasks in real life? Surely, they're "lighter" than OS threads or processes. To what extent should I strive for minimal possible number of such tasks?

解决方案

I need to make two unrelated HTTP requests in parallel. What's the simplest way to do that?

import asyncio
import aiohttp


async def request(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            return await resp.text()


async def main():
    results = await asyncio.gather(
        request('http://httpbin.org/delay/1'),
        request('http://httpbin.org/delay/1'),
    )
    print(len(results))


loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
    loop.run_until_complete(loop.shutdown_asyncgens())
finally:
    loop.close()

Yes, you may achieve concurrency with asyncio.gather or creating task with asyncio.ensure_future.

Next, I want to have some complex dependency structure? I want to start operations when I have all prerequisites for them and get results when I need the results.

While code you provided will do job, it would be nicer to split concurrent flows on different coroutines and again use asyncio.gather:

import asyncio
import aiohttp


async def request(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            return await resp.text()


async def get_ip():
    return await request('http://httpbin.org/ip')


async def post_from_get():
    async with aiohttp.ClientSession() as session:
        async with session.get('http://httpbin.org/get') as resp:
            get_res = await resp.text()
        async with session.post('http://httpbin.org/post', data=get_res) as resp:
            return await resp.text()


async def main():
    results = await asyncio.gather(
        get_ip(),
        post_from_get(),
    )
    print(len(results))


loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
    loop.run_until_complete(loop.shutdown_asyncgens())
finally:
    loop.close()

Is it true that normally developers have to think what coroutines should become separate tasks and use aforementioned functions to gain optimal performance?

Since you use asyncio you probably want to run some jobs concurrently to gain performance, right? asyncio.gather is a way to say - "run these jobs concurrently to get their results faster".

In case you shouldn't have to think what jobs should be ran concurrently to gain performance you may be ok with plain sync code.

Is there a point in using coroutines without multiple tasks in event loop?

In your code you don't have to create tasks manually if you don't want it: both snippets in this answer don't use asyncio.ensure_future. But internally asyncio uses tasks constantly (for example, as you noted asyncio.gather uses tasks itself).

How "heavy" are event loop tasks in real life? Surely, they're "lighter" than OS threads or processes. To what extent should I strive for minimal possible number of such tasks?

Main bottleneck in async program is (almost always) network: you shouldn't worry about number of asyncio coroutines/tasks at all.

这篇关于Python协程中的并行异步IO的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆