多处理多线程 GIL? [英] Multiprocessing multithreading GIL?

查看:49
本文介绍了多处理多线程 GIL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,几天以来我对 python 上的多处理和多线程进行了大量研究,但我对很多事情感到非常困惑.很多次我看到有人在谈论 GIL 不允许 Python 代码在多个 cpu 内核上执行的内容,但是当我编写一个创建多个线程的程序时,我可以看到多个 cpu 内核处于活动状态.

So, since several days I do a lot of research about multiprocessing and multithreading on python and i'm very confused about many thing. So many times I see someone talking about GIL something that doesn't allow Python code to execute on several cpu cores, but when I code a program who create many threads I can see several cpu cores are active.

第一个问题:什么是真正的 GIL?它有效吗?我想到了一些事情,比如当一个进程在多 CPU 上创建太多线程时,操作系统分布式任务.我说得对吗?

1st question: What's is really GIL? does it work? I think about something like when a process create too many thread the OS distributed task on multi cpu. Am I right?

另外,我想利用我的 CPU.我想像创建与 cpu 内核一样多的进程,在此每个进程创建与 cpu 内核一样多的线程.我在正确的车道上吗?

Other thing, I want take advantage of my cpus. I think about something like create as much process as cpu core and on this each process create as much thread as cpu core. Am I on the right lane?

推荐答案

首先,GIL 只确保在任何给定时间只有一个 cpython 字节码指令会运行.它不关心哪个 CPU 内核运行指令.这是操作系统内核的工作.

To start with, GIL only ensures that only one cpython bytecode instruction will run at any given time. It does not care about which CPU core runs the instruction. That is the job of the OS kernel.

所以回顾一下你的问题:

So going over your questions:

  1. GIL 只是一段代码.CPython 虚拟机是首先将代码编译为 Cpython 字节码的过程,但它的正常工作是解释 CPython 字节码.GIL 是一段代码,无论运行多少线程,它都能确保一次运行一行字节码.Cpython 字节码指令是构成虚拟机堆栈的内容.所以在某种程度上,GIL 将确保在任何给定时间点只有一个线程持有 GIL.(还有它不断为其他线程释放 GIL,而不是让它们饿死.)

现在来到你真正的困惑.您提到当您运行具有多个线程的程序时,您可以看到多个(可能是全部)CPU 内核启动.所以我做了一些实验,发现你的发现是正确的(这是显而易见的),但在非线程版本中的行为也相似.

Now coming to your actual confusion. You mention that when you run a program with many threads, you can see multiple (may be all) CPU cores firing up. So I did some experimentation and found that your findings are right (which is obvious) but the behaviour is similar in a non threaded version too.

def do_nothing(i):
    time.sleep(0.0001)
    return i*2

ThreadPool(20).map(do_nothing, range(10000))

def do_nothing(i):
    time.sleep(0.0001)
    return i*2

[do_nothing(i) for i in  range(10000)]

第一个是多线程的,第二个不是.当您比较两个程序的 CPU 使用率时,您会发现在这两种情况下都会有多个 CPU 内核启动.所以你注意到的,虽然是对的,但与 GIL 或线程没有太大关系.多核 CPU 使用率高只是因为操作系统内核会根据可用性将代码的执行分配到不同的核.

The first one in multithreaded and the second one is not. When you compare the CPU usage by by both the programs, you will find that in both the cases multiple CPU cores will fire up. So what you noticed, although right, has not much to do with GIL or threading. CPU usage going high in multiple cores is simply because OS kernel will distribute the execution of code to different cores based on availability.

您的最后一个问题更像是一个实验性的问题,因为不同的程序具有不同的 CPU/io 使用率.您只需要了解创建线程和进程的成本以及 GIL & 的工作成本.PVM 并优化线程和进程的数量以获得最大性能.

Your last question is more of an experimental thing as different programs have different CPU/io usage. You just have to be aware of the cost of creation of a thread and a process and the working of GIL & PVM and optimize the number of threads and processes to get the maximum perf out.

您可以通过 David Beazley 的本次演讲了解多线程如何使您的代码性能更差(或更好).

You can go through this talk by David Beazley to understand how multithreading can make your code perform worse (or better).

这篇关于多处理多线程 GIL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆