为什么以下简单的并行化代码比Python中的简单循环慢得多? [英] Why is the following simple parallelized code much slower than a simple loop in Python?

查看:117
本文介绍了为什么以下简单的并行化代码比Python中的简单循环慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 从joblib导入时间
import并行,延迟
导入多处理

array1 = [0为范围内的(100000)]

def myfun(i):
return i ** 2

####简单循环####
start_time = time.time()

在范围内(100000):
array1 [i] = i ** 2

print(简单循环的时间---%s秒---%(time.time()
- start_time


####并行循环####
start_time = time.time()
results = Parallel(n_jobs = -1,
详细= 0,
backend =线程
)(
map(delayed(myfun),


$ b print(并行化方法的时间---%s秒---%(time.time()
- start_time







  ####输出#### 
#>>> (执行文件Test_vr20.py)
#简单循环的时间--- 0.015599966049194336秒---
#并行化方法的时间--- 7.763299942016602秒---

这可能是两个选项在数组处理上的区别吗?我的实际程序会有一些更复杂的内容,但这是我需要并行化的一种计算方式,尽可能简单,但不会出现这种结果。

 系统型号:HP ProBook 640 G2,Windows 7,
IDLE for Python系统类型:基于x64的PC处理器:
Intel(R)Core TM)i5-6300U CPU @ 2.40GHz,
2401 MHz,
2核心,
4逻辑处理器


解决方案

线程:


nofollow noreferrer>>文档如果您知道你正在调用的函数是基于编译的
扩展,它在大部分计算过程中释放Python全局解释器锁(GIL)
...

问题在于,在这种情况下,你不知道知道这一点。 Python本身只允许一个线程同时运行(python解释器在每次执行python操作时都会锁定GIL)。


$ b

线程只有在 myfun()大部分时间用于已编译的Python扩展时才会有用,并且该扩展将释放GIL



并行代码非常慢,因为您正在做大量工作来创建多线程 - 然后你只能一次执行一个线程。

如果你使用 multiprocessing 后端,那么您必须将输入数据复制到四个或八个进程中的每一个(每个内核一个进程)中,在每个进程中执行处理,然后将输出数据复制回来。复制速度会很慢,但如果处理比计算正方形稍微复杂一点,那可能是值得的。测量并观察。

A simple program which calculates square of numbers and stores the results:

    import time
    from joblib import Parallel, delayed
    import multiprocessing

    array1 = [ 0 for i in range(100000) ]

    def myfun(i):
        return i**2

    #### Simple loop ####
    start_time = time.time()

    for i in range(100000):
        array1[i]=i**2

    print( "Time for simple loop         --- %s seconds ---" % (  time.time()
                                                               - start_time
                                                                 )
            )
    #### Parallelized loop ####
    start_time = time.time()
    results = Parallel( n_jobs  = -1,
                        verbose =  0,
                        backend = "threading"
                        )(
                        map( delayed( myfun ),
                             range( 100000 )
                             )
                        )
    print( "Time for parallelized method --- %s seconds ---" % (  time.time()
                                                               - start_time
                                                                 )
            )


    #### Output ####
    # >>> ( executing file "Test_vr20.py" )
    # Time for simple loop         --- 0.015599966049194336 seconds ---
    # Time for parallelized method --- 7.763299942016602 seconds ---

Could it be the difference in array handling for the two options? My actual program would have something more complicated but this is the kind of calculation that I need to parallelize, as simply as possible, but not with such results.

System Model: HP ProBook 640 G2, Windows 7,
              IDLE for Python System Type: x64-based PC Processor:
              Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz,
              2401 MHz,
              2 Core(s),
              4 Logical Processor(s)

解决方案

From the documentation of threading:

If you know that the function you are calling is based on a compiled extension that releases the Python Global Interpreter Lock (GIL) during most of its computation ...

The problem is that in the this case, you don't know that. Python itself will only allow one thread to run at once (the python interpreter locks the GIL every time it executes a python operation).

threading is only going to be useful if myfun() spends most of its time in a compiled Python extension, and that extension releases the GIL.

The Parallel code is so embarrassingly slow because you are doing a huge amount of work to create multiple threads - and then you only execute one thread at a time anyway.

If you use the multiprocessing backend, then you have to copy the input data into each of four or eight processes (one per core), do the processing in each processes, and then copy the output data back. The copying is going to be slow, but if the processing is a little bit more complex than just calculating a square, it might be worth it. Measure and see.

这篇关于为什么以下简单的并行化代码比Python中的简单循环慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆