我的Python进程在哪些CPU内核上运行? [英] On what CPU cores are my Python processes running?

查看:91
本文介绍了我的Python进程在哪些CPU内核上运行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

设置

我已经用Python(在Windows PC上)编写了一个非常复杂的软件.我的软件基本上启动了两个Python解释器外壳.双击main.py文件时,第一个外壳启动(我想).在该外壳中,其他线程以以下方式启动:

    # Start TCP_thread
    TCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,))
    TCP_thread.start()

    # Start UDP_thread
    UDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,))
    TCP_thread.start()

Main_thread启动TCP_threadUDP_thread.尽管它们是单独的线程,但它们都在一个Python Shell中运行.

Main_thread还启动一个子进程.这是通过以下方式完成的:

p = subprocess.Popen(['python', mySubprocessPath], shell=True)

从Python文档中,我了解到该子进程在单独的Python解释器会话/外壳中同时(!)运行.此子过程中的Main_thread完全专用于我的GUI. GUI为所有通信启动TCP_thread.

我知道事情会变得有些复杂.因此,我在此图中总结了整个设置:


关于此设置,我有几个问题.我将在这里列出它们:

问题1 [已解决]

Python解释器一次只使用一个CPU内核来运行所有线程,这是真的吗?换句话说,Python interpreter session 1(从图中)是否可以在一个CPU内核上同时运行所有3个线程(Main_threadTCP_threadUDP_thread)?

答案:是的,这是对的. GIL(全局解释器锁定)可确保所有线程一次运行在一个CPU内核上.

问题2 [尚未解决]

我有办法跟踪它是哪个CPU内核吗?

问题3 [部分解决]

对于这个问题,我们忘记了 threads ,但是我们专注于Python中的 subprocess 机制.启动一个新的子流程意味着启动一个新的Python解释器 instance .这是正确的吗?

答案:是的,这是正确的.最初,对于以下代码是否会创建新的Python解释器实例有些困惑:

    p = subprocess.Popen(['python', mySubprocessPath], shell = True)

该问题已得到澄清.这段代码确实启动了一个新的Python解释器实例.

Python是否足够聪明,可以使单独的Python解释器实例在不同的CPU内核上运行?有没有一种方法可以跟踪哪一个,也许还有一些零星的打印语句?

问题4 [新问题]

社区讨论提出了一个新问题.产生新进程时(在新的Python解释器实例中)显然有两种方法:

    # Approach 1(a)
    p = subprocess.Popen(['python', mySubprocessPath], shell = True)

    # Approach 1(b) (J.F. Sebastian)
    p = subprocess.Popen([sys.executable, mySubprocessPath])

    # Approach 2
    p = multiprocessing.Process(target=foo, args=(q,))

第二种方法有一个明显的缺点,那就是它只针对一个函数-而我需要打开一个新的Python脚本.无论如何,两种方法在实现目标上都相似吗?

解决方案

问:是真的,Python解释器一次只使用一个CPU内核来运行所有线程吗?

不. GIL和CPU关联性是不相关的概念.无论如何,在阻止I/O操作,在C扩展内部进行长时间的CPU密集型计算时,都可以释放GIL.

如果线程在GIL上被阻塞;它可能不在任何CPU内核上,因此可以说,纯Python多线程代码在CPython实现中一次只能使用一个CPU内核.

Q:换句话说,Python解释器会话1(从图中)是否将在一个CPU内核上运行所有3个线程(Main_thread,TCP_thread和UDP_thread)?

我不认为CPython隐式管理CPU亲和力.可能依赖于OS调度程序来选择在哪里运行线程. Python线程是在真正的OS线程之上实现的.

问:还是Python解释器能够将它们分布在多个内核上?

要找出可用的CPU数量,请执行以下操作:

>>> import os
>>> len(os.sched_getaffinity(0))
16

同样,是否在不同的CPU上调度线程并不取决于Python解释器.

问:假设问题1的答案是多核",我是否可以通过一些零星的打印语句来跟踪每个线程在哪个核上运行?如果对问题1的回答是仅一个核心",我是否有办法跟踪是哪个核心?

我想,一个特定的CPU可能会从一个时隙更改为另一个时隙.您可以查看旧Linux内核上的/proc/<pid>/task/<tid>/status之类的东西.在我的计算机上,可以从/proc/<pid>/stat/proc/<pid>/task/<tid>/stat task_cpu中读取:

>>> open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14]
'4'

对于当前的便携式解决方案,请查看 psutil 是否公开了此类信息.

您可以将当前进程限制为一组CPU:

os.sched_setaffinity(0, {0}) # current process on 0-th core

问:对于这个问题,我们忘记了线程,但是我们专注于Python中的子进程机制.启动一个新的子进程意味着启动一个新的Python解释器会话/shell.这样对吗?

是的. subprocess模块创建新的OS进程.如果您运行python可执行文件,则它将启动一个新的Python插入程序.如果您运行bash脚本,则不会创建新的Python解释器,即,运行bash可执行文件不会启动新的Python解释器/会话/等.

问:假设它是正确的,Python是否足够聪明,可以使单独的解释器会话在不同的CPU内核上运行?有没有办法跟踪这种情况,也许还有一些零星的打印语句?

请参见上文(即OS决定在哪里运行线程,并且可能会有OS API公开运行线程的位置).

multiprocessing.Process(target=foo, args=(q,)).start()

multiprocessing.Process还会创建一个新的OS进程(运行新的Python解释器).

实际上,我的子进程是另一个文件.所以这个例子对我不起作用.

Python使用模块来组织代码.如果您的代码位于another_file.py中,则在主模块中为import another_file,并将another_file.foo传递给multiprocessing.Process.

尽管如此,您如何将其与p = subprocess.Popen(..)进行比较?我是否要使用subprocess.Popen(..)相对multiprocessing.Process(..)来启动新进程(或者我应该说"python解释器实例")?

multiprocessing.Process()可能在subprocess.Popen()的顶部实现. multiprocessing提供了与threading API相似的API,并且抽象了python进程之间的通信细节(如何序列化Python对象以在进程之间发送).

如果没有CPU密集型任务,则可以在单个进程中运行GUI和I/O线程.如果您要执行一系列占用大量CPU的任务,则要一次使用多个CPU,请使用具有C扩展名的多个线程,例如lxmlregexnumpy(或者使用 concurrent.futures ).

问::社区讨论提出了一个新问题.产生新进程时(在新的Python解释器实例中)显然有两种方法:

# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)

# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])

# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))

方法1(a)" 在POSIX上是错误的(尽管在Windows上可以使用).为了实现可移植性,请使用方法1(b)" ,除非您知道需要cmd.exe(在这种情况下,请输入字符串,以确保使用了正确的命令行转义).

第二种方法有一个明显的缺点,即它只针对一个函数-而我需要打开一个新的Python脚本.无论如何,两种方法在实现目标上都相似吗?

subprocess创建新进程,任何进程,例如,您可以运行bash脚本. multprocessing用于在另一个进程中运行Python代码. 导入 Python模块并运行其功能比将其作为脚本运行更灵活.请参见使用子进程在python脚本中使用输入来调用python脚本.

The setup

I have written a pretty complex piece of software in Python (on a Windows PC). My software starts basically two Python interpreter shells. The first shell starts up (I suppose) when you double click the main.py file. Within that shell, other threads are started in the following way:

    # Start TCP_thread
    TCP_thread = threading.Thread(name = 'TCP_loop', target = TCP_loop, args = (TCPsock,))
    TCP_thread.start()

    # Start UDP_thread
    UDP_thread = threading.Thread(name = 'UDP_loop', target = UDP_loop, args = (UDPsock,))
    TCP_thread.start()

The Main_thread starts a TCP_thread and a UDP_thread. Although these are separate threads, they all run within one single Python shell.

The Main_threadalso starts a subprocess. This is done in the following way:

p = subprocess.Popen(['python', mySubprocessPath], shell=True)

From the Python documentation, I understand that this subprocess is running simultaneously (!) in a separate Python interpreter session/shell. The Main_threadin this subprocess is completely dedicated to my GUI. The GUI starts a TCP_thread for all its communications.

I know that things get a bit complicated. Therefore I have summarized the whole setup in this figure:


I have several questions concerning this setup. I will list them down here:

Question 1 [Solved]

Is it true that a Python interpreter uses only one CPU core at a time to run all the threads? In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?

Answer: yes, this is true. The GIL (Global Interpreter Lock) ensures that all threads run on one CPU core at a time.

Question 2 [Not yet solved]

Do I have a way to track which CPU core it is?

Question 3 [Partly solved]

For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter instance. Is this correct?

Answer: Yes this is correct. At first there was some confusion about whether the following code would create a new Python interpreter instance:

    p = subprocess.Popen(['python', mySubprocessPath], shell = True)

The issue has been clarified. This code indeed starts a new Python interpreter instance.

Will Python be smart enough to make that separate Python interpreter instance run on a different CPU core? Is there a way to track which one, perhaps with some sporadic print statements as well?

Question 4 [New question]

The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):

    # Approach 1(a)
    p = subprocess.Popen(['python', mySubprocessPath], shell = True)

    # Approach 1(b) (J.F. Sebastian)
    p = subprocess.Popen([sys.executable, mySubprocessPath])

    # Approach 2
    p = multiprocessing.Process(target=foo, args=(q,))

The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?

解决方案

Q: Is it true that a Python interpreter uses only one CPU core at a time to run all the threads?

No. GIL and CPU affinity are unrelated concepts. GIL can be released during blocking I/O operations, long CPU intensive computations inside a C extension anyway.

If a thread is blocked on GIL; it is probably not on any CPU core and therefore it is fair to say that pure Python multithreading code may use only one CPU core at a time on CPython implementation.

Q: In other words, will the Python interpreter session 1 (from the figure) run all 3 threads (Main_thread, TCP_thread and UDP_thread) on one CPU core?

I don't think CPython manages CPU affinity implicitly. It is likely relies on OS scheduler to choose where to run a thread. Python threads are implemented on top of real OS threads.

Q: Or is the Python interpreter able to spread them over multiple cores?

To find out the number of usable CPUs:

>>> import os
>>> len(os.sched_getaffinity(0))
16

Again, whether or not threads are scheduled on different CPUs does not depend on Python interpreter.

Q: Suppose that the answer to Question 1 is 'multiple cores', do I have a way to track on which core each thread is running, perhaps with some sporadic print statements? If the answer to Question 1 is 'only one core', do I have a way to track which one it is?

I imagine, a specific CPU may change from one time-slot to another. You could look at something like /proc/<pid>/task/<tid>/status on old Linux kernels. On my machine, task_cpu can be read from /proc/<pid>/stat or /proc/<pid>/task/<tid>/stat:

>>> open("/proc/{pid}/stat".format(pid=os.getpid()), 'rb').read().split()[-14]
'4'

For a current portable solution, see whether psutil exposes such info.

You could restrict the current process to a set of CPUs:

os.sched_setaffinity(0, {0}) # current process on 0-th core

Q: For this question we forget about threads, but we focus on the subprocess mechanism in Python. Starting a new subprocess implies starting up a new Python interpreter session/shell. Is this correct?

Yes. subprocess module creates new OS processes. If you run python executable then it starts a new Python interpeter. If you run a bash script then no new Python interpreter is created i.e., running bash executable does not start a new Python interpreter/session/etc.

Q: Supposing that it is correct, will Python be smart enough to make that separate interpreter session run on a different CPU core? Is there a way to track this, perhaps with some sporadic print statements as well?

See above (i.e., OS decides where to run your thread and there could be OS API that exposes where the thread is run).

multiprocessing.Process(target=foo, args=(q,)).start()

multiprocessing.Process also creates a new OS process (that runs a new Python interpreter).

In reality, my subprocess is another file. So this example won't work for me.

Python uses modules to organize the code. If your code is in another_file.py then import another_file in your main module and pass another_file.foo to multiprocessing.Process.

Nevertheless, how would you compare it to p = subprocess.Popen(..)? Does it matter if I start the new process (or should I say 'python interpreter instance') with subprocess.Popen(..)versus multiprocessing.Process(..)?

multiprocessing.Process() is likely implemented on top of subprocess.Popen(). multiprocessing provides API that is similar to threading API and it abstracts away details of communication between python processes (how Python objects are serialized to be sent between processes).

If there are no CPU intensive tasks then you could run your GUI and I/O threads in a single process. If you have a series of CPU intensive tasks then to utilize multiple CPUs at once, either use multiple threads with C extensions such as lxml, regex, numpy (or your own one created using Cython) that can release GIL during long computations or offload them into separate processes (a simple way is to use a process pool such as provided by concurrent.futures).

Q: The community discussion raised a new question. There are apparently two approaches when spawning a new process (within a new Python interpreter instance):

# Approach 1(a)
p = subprocess.Popen(['python', mySubprocessPath], shell = True)

# Approach 1(b) (J.F. Sebastian)
p = subprocess.Popen([sys.executable, mySubprocessPath])

# Approach 2
p = multiprocessing.Process(target=foo, args=(q,))

"Approach 1(a)" is wrong on POSIX (though it may work on Windows). For portability, use "Approach 1(b)" unless you know you need cmd.exe (pass a string in this case, to make sure that the correct command-line escaping is used).

The second approach has the obvious downside that it targets just a function - whereas I need to open up a new Python script. Anyway, are both approaches similar in what they achieve?

subprocess creates new processes, any processes e.g., you could run a bash script. multprocessing is used to run Python code in another process. It is more flexible to import a Python module and run its function than to run it as a script. See Call python script with input with in a python script using subprocess.

这篇关于我的Python进程在哪些CPU内核上运行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆