在Linux上创建线程与进程的开销 [英] Overhead in creating a thread vs process on Linux

查看:162
本文介绍了在Linux上创建线程与进程的开销的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试回答在Python中创建线程与进程相比需要多少开销的问题.我从一个类似的问题修改了代码,该问题基本上是通过两个线程运行一个函数,然后通过两个进程运行相同的函数并报告时间.

I am trying to answer the question of how much overhead there is in creating a thread vs a process in python. I modified code from a similar question which basically runs a function with two threads and then runs the same function with two processes and reports the time.

import time, sys
NUM_RANGE = 100000000

from multiprocessing  import Process
import threading

def timefunc(f):
    t = time.time()
    f()
    return time.time() - t

def multiprocess():
    class MultiProcess(Process):
        def __init__(self):
            Process.__init__(self)

        def run(self):
            # Alter string + test processing speed
            for i in xrange(NUM_RANGE):
                a = 20 * 20


    for _ in xrange(300):
      MultiProcess().start()

def multithreading():
    class MultiThread(threading.Thread):
        def __init__(self):
            threading.Thread.__init__(self)

        def run(self):
            # Alter string + test processing speed
            for i in xrange(NUM_RANGE):
                a = 20 * 20

    for _ in xrange(300):
      MultiThread().start()

print "process run time" + str(timefunc(multiprocess))
print "thread run time" + str(timefunc(multithreading))

然后我得到了7.9s的多处理能力和7.9s的多线程处理能力

Then I got 7.9s for multiprocessing and 7.9s for multithreading

我要回答的主要问题是专门针对Linux上的数千个网络请求使用多线程或多处理是否合适.似乎根据此代码,它们在启动时间方面是相同的,但也许进程的内存使用量要大得多?

The main question I'm trying to answer is if it is appropriate to use multithreading or multiprocessing for thousands of network requests on Linux specifically. Seems like according to this code they are the same in terms of startup time but perhaps processes are much heavier in memory usage?

推荐答案

您的代码不适用于基准测试进程和线程之间的启动时间.多线程Python代码(在CPython中)表示单核.一个线程中的任何Python代码执行都将在此线程持有全局解释器锁的时间( GIL ).这意味着您只能与线程并发,而不能真正的并行,只要它涉及Python字节码即可.

Your code is not suitable to benchmark start up times between processes and threads. Multithreading Python-code (in CPython) means single core. Any Python-code execution in one thread will exclude proceeding for all other threads in this process for the time this one thread is holding the global interpreter lock (GIL). This means you can only have concurrency with threads and not true parallelism as long it concerns Python bytecode.

您的示例主要是测试特定于CPU的工作负载性能(在紧密循环中运行计算),无论如何您都不会使用线程.如果要衡量创建开销,则必须从基准中除去创建本身之外的任何内容.

Your example is mainly benchmarking a specific CPU-bound workload-performance (running a calculation within a tight loop), something you wouldn't use threads for anyway. If you want to measure creation overhead you must strip anything but creation itself from your benchmark (as far possible).

TL; DR

启动线程(在Ubuntu 18.04上标有基准)比启动进程便宜很多倍.

Starting up a thread (benchmarked on Ubuntu 18.04) is many times cheaper than starting up a process.

与线程启动相比,使用指定的start_methods进行进程启动需要:

Compared to thread start-up, process start up with specified start_methods takes:

  • :长约33倍
  • forkserver :延长了〜6693倍
  • 生成:大约〜7558倍
  • fork: ~33x longer
  • forkserver: ~6693x longer
  • spawn: ~7558x longer

完整结果在底部.

基准

我最近升级到Ubuntu 18.04,并测试了希望能更接近真实情况的脚本启动.请注意,此代码是Python 3.

I recently upgraded to Ubuntu 18.04 and tested start up with a script that hopefully is closer to the truth. Note this code is Python 3.

一些用于格式化和比较测试结果的实用程序:

Some utilities for formatting and comparing the test results:

# thread_vs_proc_start_up.py
import sys
import time
import pandas as pd
from threading import Thread
import multiprocessing as mp
from multiprocessing import Process, Pipe


def format_secs(sec, decimals=2) -> str:
    """Format subseconds.

    Example:
    >>>format_secs(0.000_000_001)
    # Out: '1.0 ns'
    """
    if sec < 1e-6:
        return f"{sec * 1e9:.{decimals}f} ns"
    elif sec < 1e-3:
        return f"{sec * 1e6:.{decimals}f} µs"
    elif sec < 1:
        return f"{sec * 1e3:.{decimals}f} ms"
    elif sec >= 1:
        return f"{sec:.{decimals}f} s"

def compare(value, base):
    """Return x-times relation of value and base."""
    return f"{(value / base):.2f}x"


def display_results(executor, result_series):
    """Display results for Executor."""
    exe_str = str(executor).split(".")[-1].strip('\'>')
    print(f"\nresults for {exe_str}:\n")

    print(result_series.describe().to_string(), "\n")
    print(f"Minimum with {format_secs(result_series.min())}")
    print("-" * 60)

以下基准功能.对于n_runs中的每个测试,都会创建一个新管道. 新的进程或线程(执行程序)启动,目标函数calc_start_up_time立即返回时差.就是这样.

The benchmark functions below. For every single test out of n_runs, a fresh pipe is created. A new Process or Thread (an executor) starts and the target function calc_start_up_time immediately returns the time difference. That's all.

def calc_start_up_time(pipe_in, start):
    pipe_in.send(time.perf_counter() - start)
    pipe_in.close()


def run(executor, n_runs):

    results = []
    for _ in range(int(n_runs)):
        pipe_out, pipe_in = Pipe(duplex=False)
        exe = executor(target=calc_start_up_time, args=(pipe_in,
                                                    time.perf_counter(),))
        exe.start()
        # Note: Measuring only the time for exe.start() returning like:
        # start = time.perf_counter()
        # exe.start()
        # end = time.perf_counter()
        # would not include the full time a new process needs to become
        # production ready.
        results.append(pipe_out.recv())
        pipe_out.close()
        exe.join()

    result_series = pd.Series(results)
    display_results(executor, result_series)
    return result_series.min()

它的构建是使用start_method和作为命令行参数传递的运行次数从终端启动的.基准测试将始终使用指定的start_method(在Ubuntu 18.04:fork,spawn,forkserver中可用)启动的进程运行n_runs,然后与线程启动的n_runs进行比较.结果着重于最小值,因为它们显示了最快的速度.

It's build to be started from terminal with the start_method and the number of runs passed as command line arguments. The benchmark will always run n_runs of a process start up with the specified start_method (available on Ubuntu 18.04: fork, spawn, forkserver) and then compare with n_runs of thread start up. Results are focusing on minimums because they show how fast is possible.

if __name__ == '__main__':

    # Usage:
    # ------
    # Start from terminal with start_method and number of runs as arguments:
    #   $python thread_vs_proc_start_up.py fork 100
    #
    # Get all available start methods on your system with:
    # >>>import multiprocessing as mp
    # >>>mp.get_all_start_methods()

    start_method, n_runs = sys.argv[1:]
    mp.set_start_method(start_method)

    mins = []
    for executor in [Process, Thread]:
        mins.append(run(executor, n_runs))
    print(f"Minimum start-up time for processes takes "
          f"{compare(*mins)} "
          f"longer than for threads.")



结果



Results

在生锈的机器上使用n_runs=1000:

# Ubuntu 18.04 start_method: fork
# ================================
results for Process:

count    1000.000000
mean        0.002081
std         0.000288
min         0.001466
25%         0.001866
50%         0.001973
75%         0.002268
max         0.003365 

Minimum with 1.47 ms
------------------------------------------------------------

results for Thread:

count    1000.000000
mean        0.000054
std         0.000013
min         0.000044
25%         0.000047
50%         0.000051
75%         0.000058
max         0.000319 

Minimum with 43.89 µs
------------------------------------------------------------
Minimum start-up time for processes takes 33.41x longer than for threads.


# Ubuntu 18.04 start_method: spawn
# ================================

results for Process:

count    1000.000000
mean        0.333502
std         0.008068
min         0.321796
25%         0.328776
50%         0.331763
75%         0.336045
max         0.415568 

Minimum with 321.80 ms
------------------------------------------------------------

results for Thread:

count    1000.000000
mean        0.000056
std         0.000016
min         0.000043
25%         0.000046
50%         0.000048
75%         0.000065
max         0.000231 

Minimum with 42.58 µs
------------------------------------------------------------
Minimum start-up time for processes takes 7557.80x longer than for threads.


# Ubuntu 18.04 start_method: forkserver
# =====================================


results for Process:

count    1000.000000
mean        0.295011
std         0.007157
min         0.287871
25%         0.291440
50%         0.293263
75%         0.296185
max         0.361581 

Minimum with 287.87 ms
------------------------------------------------------------

results for Thread:

count    1000.000000
mean        0.000055
std         0.000014
min         0.000043
25%         0.000045
50%         0.000047
75%         0.000064
max         0.000251 

Minimum with 43.01 µs
------------------------------------------------------------
Minimum start-up time for processes takes 6693.44x longer than for threads.

这篇关于在Linux上创建线程与进程的开销的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆