为什么以下简单的并行化代码比 Python 中的简单循环慢得多? [英] Why is the following simple parallelized code much slower than a simple loop in Python?

查看:31
本文介绍了为什么以下简单的并行化代码比 Python 中的简单循环慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一个计算平方数并存储结果的简单程序:

A simple program which calculates square of numbers and stores the results:

    import time
    from joblib import Parallel, delayed
    import multiprocessing

    array1 = [ 0 for i in range(100000) ]

    def myfun(i):
        return i**2

    #### Simple loop ####
    start_time = time.time()

    for i in range(100000):
        array1[i]=i**2

    print( "Time for simple loop         --- %s seconds ---" % (  time.time()
                                                               - start_time
                                                                 )
            )
    #### Parallelized loop ####
    start_time = time.time()
    results = Parallel( n_jobs  = -1,
                        verbose =  0,
                        backend = "threading"
                        )(
                        map( delayed( myfun ),
                             range( 100000 )
                             )
                        )
    print( "Time for parallelized method --- %s seconds ---" % (  time.time()
                                                               - start_time
                                                                 )
            )

<小时>

    #### Output ####
    # >>> ( executing file "Test_vr20.py" )
    # Time for simple loop         --- 0.015599966049194336 seconds ---
    # Time for parallelized method --- 7.763299942016602 seconds ---

会不会是这两个选项在数组处理上的不同?我的实际程序会有更复杂的东西,但这是我需要并行化的那种计算,尽可能简单,但不是这样的结果.

Could it be the difference in array handling for the two options? My actual program would have something more complicated but this is the kind of calculation that I need to parallelize, as simply as possible, but not with such results.

System Model: HP ProBook 640 G2, Windows 7,
              IDLE for Python System Type: x64-based PC Processor:
              Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz,
              2401 MHz,
              2 Core(s),
              4 Logical Processor(s)

推荐答案

来自文档线程:

如果你知道你正在调用的函数是基于一个已编译的释放 Python 全局解释器锁 (GIL) 的扩展在它的大部分计算过程中......

If you know that the function you are calling is based on a compiled extension that releases the Python Global Interpreter Lock (GIL) during most of its computation ...

问题是在这种情况下,您不知道.Python 本身只会允许一个线程同时运行(python 解释器每次执行 python 操作时都会锁定 GIL).

The problem is that in the this case, you don't know that. Python itself will only allow one thread to run at once (the python interpreter locks the GIL every time it executes a python operation).

threading 仅在 myfun() 将大部分时间花在编译的 Python 扩展中时才有用,该扩展释放了 GIL.

threading is only going to be useful if myfun() spends most of its time in a compiled Python extension, and that extension releases the GIL.

Parallel 代码慢得令人尴尬,因为您正在做大量工作来创建多个线程 - 然后无论如何您一次只能执行一个线程.

The Parallel code is so embarrassingly slow because you are doing a huge amount of work to create multiple threads - and then you only execute one thread at a time anyway.

如果你使用multiprocessing后端,那么你必须将输入数据复制到四个或八个进程中的每一个(每个内核一个),在每个进程中进行处理,然后复制输出数据回来.复制会很慢,但如果处理比计算一个正方形更复杂一点,那么它可能是值得的.测量并查看.

If you use the multiprocessing backend, then you have to copy the input data into each of four or eight processes (one per core), do the processing in each processes, and then copy the output data back. The copying is going to be slow, but if the processing is a little bit more complex than just calculating a square, it might be worth it. Measure and see.

这篇关于为什么以下简单的并行化代码比 Python 中的简单循环慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆