asyncio:为什么默认情况下它不是非阻塞的 [英] asyncio: why isn't it non-blocking by default

查看:131
本文介绍了asyncio:为什么默认情况下它不是非阻塞的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

默认情况下,asyncio同步运行协程.如果它们包含阻塞的IO代码,则它们仍然等待其返回.解决方法是 loop.run_in_executor() ,它将代码转换为线程.如果某个线程在IO上阻塞,则另一个线程可以开始执行.这样您就不会浪费时间等待IO呼叫.

如果在没有执行程序的情况下使用asyncio,则会降低这些加速.所以我想知道,为什么您必须显式使用执行程序.为什么不默认启用它们? (在下文中,我将重点介绍http请求.但是,它们实际上仅作为示例.我对一般原则感兴趣.)

经过一番搜索,我发现了 aiohttp .它实际上是提供asynciorequests组合的库:非阻塞HTTP调用.对于执行程序,asynciorequests的行为几乎与aiohttp相似.有理由实施一个新库吗?您是否因使用执行程序而支付性能损失?

回答了这个问题:为什么asyncio并不总是使用执行器? Mikhail Gerasimov向我解释说,执行程序将加速OS线程,并且它们可能变得昂贵.因此,不要将它们作为默认行为是有道理的. aiohttp优于在执行程序中使用requests模块,因为它提供了仅包含协程的非阻塞代码.

这使我想到了这个问题. aiohttp 自我宣传为:

用于asyncio和Python的异步HTTP客户端/服务器.

那么aiohttp是基于asyncio的吗?为什么asyncio不能只提供协程提供非阻塞代码?那将是理想的默认设置.

还是aiohttp本身实现了这个新的事件循环(没有OS线程)? 在那种情况下,我不明白为什么他们会根据asyncio做广告. Async/await是一种语言功能. Asyncio是一个事件循环.而且,如果aiohttp有其自己的事件循环,则与asyncio的交叉点应该很少.实际上,我认为这样的事件循环将比http请求具有更大的功能.

解决方案

asyncio是异步的,因为协程自愿合作. 所有 asyncio代码必须牢记合作,这就是重点.否则,您最好只使用线程来实现并发.

您不能在执行程序中运行阻止"功能(非协程功能或无法合作的方法),因为您不能只是假设该代码可以在单独的执行程序线程中运行.甚至需要在执行程序中运行.

Python标准库中充满了真正有用的代码,asyncio项目将要使用这些代码.标准库的大部分由常规的阻止"功能和类定义组成.他们很快地完成了工作,因此即使他们阻塞"了他们也可以在合理的时间内返回.

但是大多数代码也不是线程安全的,通常不需要.但是,一旦asyncio将自动在执行程序 中运行所有此类代码,那么您将无法再使用非线程安全的函数.此外,创建一个线程来运行同步代码不是免费的,创建线程对象会花费时间,并且您的操作系统也不允许您运行无限数量的线程.标准库函数和方法的加载是 fast ,为什么您要在单独的线程中运行str.splitlines()urllib.parse.quote(),而只需执行代码并完成该操作会更快?

您可能会说这些功能没有被您的标准阻止.您没有在此处定义阻止",但阻止"仅表示:不会自动让步..如果我们将其范围缩小到在必须等待某些时间而计算机无法执行其他操作时会自动放弃,那么下一个问题将是您将如何检测到它应该已屈服?

答案是您不能. time.sleep()是一个阻塞函数,您想屈服于循环,但这是C函数调用. Python无法知道 time.sleep()将阻塞更长的时间,因为调用time.sleep()的函数将在全局命名空间中查找名称time,然后在属性仅在实际执行time.sleep()表达式时才对名称查找结果.由于可以在执行过程中随时更改Python的名称空间 ,因此在实际执行函数之前,您不知道time.sleep()将做什么.

您可以说time.sleep()实现应在调用时自动产生,但随后您必须开始识别所有此类函数.而且,您必须修补的位置数量没有限制,而且您永远无法知道所有位置.当然不适合第三方库.例如, python-adb项目使用libusb1库.那不是标准的I/O代码路径,那么Python怎么会知道创建和使用这些连接是产生收益的好地方?

因此,您不能仅假设代码需要在执行程序中运行,并非所有代码 都可以在执行程序中运行,因为它不是线程安全的,并且Python无法检测到当代码被阻塞并且确实应该屈服时.

那么asyncio下的协程如何协作?通过使用 任务对象每个需要与其他任务同时运行的逻辑代码,并使用 wakeup 回调到将来的对象完成"回调列表,并将控制权返回到循环.在以后的某个时刻,当将来标记为完成时,将运行唤醒回调,并且该任务将执行另一个协程调用链步骤.

其他 else 负责将将来的对象标记为已完成.当您使用asyncio.sleep()时,将在特定时间运行的回调将被赋予循环,该回调将把asyncio.sleep()将来标记为已完成.当您使用流对象执行I/O时,( UNIX),循环使用 select调用来检测何时I/O操作完成后,是时候唤醒将来的对象了.而且,当您使用锁或其他同步原语时,同步原语将在适当的时候维护一堆期货以将其标记为完成"(等待锁?将期货添加到桩中.释放持有的锁?从桩中选择下一个期货并将其标记为已完成,因此下一个任务是等待锁的人可以唤醒并获取锁,等等.

将阻塞的同步代码放入执行程序中只是这里的另一种合作形式.在项目中使用asyncio时,要由 developer 来确保使用给定的工具来确保协程协同工作.您可以自由地使用阻塞文件上的open()调用而不是使用流,并且当您知道需要在单独的线程中运行代码以避免阻塞时间太长时,可以自由地使用执行程序.

最后但并非最不重要的一点,使用asyncio的全部目的是避免尽可能多地使用线程.使用线程有缺点.代码必须是线程安全的(控件可以在任何地方的 线程之间切换,因此访问共享数据的两个线程应谨慎操作,而小心"可以表示代码被放慢了).线程无论是否有事都要执行.在 all 等待I/O发生的固定数量的线程之间切换控制会浪费CPU时间,其中<​​c0>循环可以自由地查找未等待的任务.

By default, asyncio runs coroutines synchronously. If they contain blocking IO code, they still wait for it to return. A way around this is loop.run_in_executor(), which converts the code into threads. If a thread blocks on IO, another thread can start executing. So you don't waste time waiting for IO calls.

If you use asyncio without executors, you loose those speedups. So I was wondering, why do you have to use executors explicitly. Why not enable them by default ? (In the following, I'll focus on http requests. But they really only serve as an example. I'm interested in the general principles.)

After some searching I found aiohttp. It's a library that essentially offers a combination of asyncio and requests: Non blocking HTTP calls. With executors, asyncio and requests behave pretty much just like aiohttp. Is there a reason to implement a new library, do you pay a performance penalty for using executors?

This question was answered: Why doesn't asyncio always use executors? Mikhail Gerasimov has explained to me that executors will spin up OS-threads and they can become expensive. So it makes sense not to have them as default behaviour. aiohttp is better than using the requests module in an executor, since it offers non-blocking code with only coroutines.

Which brings me to this question. aiohttp advertises itself as :

Asynchronous HTTP Client/Server for asyncio and Python.

So aiohttp is based on asyncio? Why doesn't asyncio offer non-blocking code with only coroutines then? That would be the ideal default.

Or did aiohttp implement this new event-loop (without OS-threads) itself ? In that case I don't understand why they advertise themselves as based on asyncio. Async/await are a language feature. Asyncio is an event-loop. And if aiohttp has its own event-loop there should be little intersection with asyncio. Actually, I would argue that such an event loop would be a much bigger feature than http requests.

解决方案

asyncio is asynchronous because coroutines cooperate voluntarily. All asyncio code must be written with cooperation in mind, that's the point entirely. Otherwise you may as well use threading exclusively to achieve concurrency.

You can't run 'blocking' functions (non-coroutine functions or methods that won't cooperate) in an executor because you can't just assume that that code can be run in a separate executor thread. Or even if it needs to be run in an executor.

The Python standard library is full of really useful code, that asyncio projects will want to make use of. The majority of the standard library consists of regular, 'blocking' function and class definitions. They do their work quickly, so even though they 'block', they return in reasonable time.

But most of that code is also not thread-safe, it doesn't need to be usually. But as soon as asyncio would run all such code in an executor automatically, then you can't use non-thread-safe functions any more. Besides, creating a thread to run synchronous code in is not free, creating the thread object costs time, and your OS won't let you run an infinite number of threads either. Loads of standard library functions and methods are fast, why would you want to run str.splitlines() or urllib.parse.quote() in a separate thread when it would be much quicker to just execute the code and be done with it?

You may say that those functions are not blocking by your standards. You didn't define 'blocking' here, but 'blocking' just means: won't voluntarily yield.. If we narrow this down to won't voluntarily yield when it has to wait for something and the computer could be doing something else instead, then the next question would be how would you detect that it should have yielded?

The answer to that is that you can't. time.sleep() is a blocking function where you'd want to yield to the loop for, but that's a C function call. Python can't know that time.sleep() is going to block for longer, because a function that calls time.sleep() will look up the name time in the global namespace, and then the attribute sleep on the result of the name lookup, only when actually executing the time.sleep() expression. Because Python's namespaces can be altered at any point during execution, you can't know what time.sleep() will do until you actually execute the function.

You could say that the time.sleep() implementation should automatically yield when called then, but then you'd have to start identifying all such functions. And there is no limit to the number of places you'd have to patch and you can't ever know all the places. Certainly not for third-party libraries. For example the python-adb project gives you a synchronous USB connection to an Android device, using the libusb1 library. That's not a standard I/O codepath, so how would Python know that creating and using those connections are good places to yield?

So you can't just assume that code needs to be run in an executor, not all code can be run in an executor because it is not thread-safe, and Python can't detect when code is blocking and should really be yielding.

So how do coroutines under asyncio cooperate? By using task objects per logical piece of code that needs to run concurrently with other tasks, and by using future objects to signal to the task that the current logical piece of code wants to cede control to other tasks. That's what makes asynchronous asyncio code asynchronous, voluntarily ceding control. When the loop gives control to one task out of many, the task executes a single 'step' of the coroutine call chain, until that call chain produces a future object, at which point the task adds a wakeup callback to the future object 'done' callback list and returns control to the loop. At some point later, when the future is marked done, the wakeup callback is run and the task will execute another coroutine callchain step.

Something else is responsible for marking the future objects as done. When you use asyncio.sleep(), a callback to be run at a specific time is given to the loop, where that callback would mark the asyncio.sleep() future as done. When you use a stream object to perform I/O, then (on UNIX), the loop uses select calls to detect when it is time to wake up a future object when the I/O operation is done. And when you use a lock or other synchronisation primitive, then the synchronisation primitive will maintain a pile of futures to mark as 'done' when appropriate (Waiting for a lock? add a future to the pile. Freeing a held lock? Pick the next future from the pile and mark it as done, so the next task that was waiting for the lock can wake up and acquire the lock, etc.).

Putting synchronous code that blocks into an executor is just another form of cooperation here. When using asyncio in a project, it is up to the developer to make sure that you use the tools given to you to make sure your coroutines cooperate. You are free to use blocking open() calls on files instead of using streams, and you are free to use an executor when you know the code needs to be run in a separate thread to avoid blocking too long.

Last but not least, the whole point of using asyncio is to avoid using threading as much as possible. Using threads has downsides; code needs to be thread-safe (control can switch between threads anywhere, so two threads accessing a shared piece of data should do so with care, and 'taking care' can mean that the code is slowed down). Threads execute no matter if they have anything to do or not; switching control between a fixed number of threads that all wait for I/O to happen is a waste of CPU time, where the asyncio loop is free to find a task that is not waiting.

这篇关于asyncio:为什么默认情况下它不是非阻塞的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆