使用Python的Windows上的并发/并行 [英] Concurrency/Parallelism on Windows with Python

查看:108
本文介绍了使用Python的Windows上的并发/并行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开发了简单的程序来解决八个皇后问题.现在,我想使用不同的元参数进行更多测试,以便快速进行.我经历了几次分析,并能够显着减少运行时,但是我达到了这样的地步,我相信只有部分计算可以同时进行才能使其更快.我尝试使用multiprocessingconcurrent.futures模块,但是它并不能大大改善运行时,在某些情况下甚至会降低执行速度.那只是给出一些背景.

I developed simple program to solve eight queens problem. Now I would like to do some more testing with different meta-parameters so I would like to make it fast. I went through a few iterations of profiling and was able to cut runtime significantly but I reached the point where I believe only parts of computations concurrently could make it faster. I tried to use multiprocessing and concurrent.futures modules but it did not improve runtime a lot and in some cases even slowed down execution. That is to just give some context.

我能够提出类似的代码结构,其中顺序版本胜于并发.

I was able to come up with similar code structure where sequential version beats concurrent.

import numpy as np
import concurrent.futures
import math
import time
import multiprocessing

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def generate_data(seed):
    np.random.seed(seed)
    numbers = []
    for _ in range(5000):
        nbr = np.random.randint(50000, 100000)
        numbers.append(nbr)
    return numbers

def run_test_concurrent(numbers):
    print("Concurrent test")
    start_tm = time.time()
    chunk = len(numbers)//3
    primes = None
    with concurrent.futures.ProcessPoolExecutor(max_workers=3) as pool:
        primes = list(pool.map(is_prime, numbers, chunksize=chunk))
    print("Time: {:.6f}".format(time.time() - start_tm))
    print("Number of primes: {}\n".format(np.sum(primes)))


def run_test_sequential(numbers):
    print("Sequential test")
    start_tm = time.time()
    primes = [is_prime(nbr) for nbr in numbers]
    print("Time: {:.6f}".format(time.time() - start_tm))
    print("Number of primes: {}\n".format(np.sum(primes)))


def run_test_multiprocessing(numbers):
    print("Multiprocessing test")
    start_tm = time.time()
    chunk = len(numbers)//3
    primes = None
    with multiprocessing.Pool(processes=3) as pool:
        primes = list(pool.map(is_prime, numbers, chunksize=chunk))
    print("Time: {:.6f}".format(time.time() - start_tm))
    print("Number of primes: {}\n".format(np.sum(primes)))


def main():
    nbr_trails = 5
    for trail in range(nbr_trails):
        numbers = generate_data(trail*10)
        run_test_concurrent(numbers)
        run_test_sequential(numbers)
        run_test_multiprocessing(numbers)
        print("--\n")


if __name__ == '__main__':
    main()

当我在计算机上运行它-Windows 7(具有四个内核的Intel Core i5)时,我得到以下输出:

When I run it on my machine - Windows 7, Intel Core i5 with four cores I got the following output:

Concurrent test
Time: 2.006006
Number of primes: 431

Sequential test
Time: 0.010000
Number of primes: 431

Multiprocessing test
Time: 1.412003
Number of primes: 431
--

Concurrent test
Time: 1.302003
Number of primes: 447

Sequential test
Time: 0.010000
Number of primes: 447

Multiprocessing test
Time: 1.252003
Number of primes: 447
--

Concurrent test
Time: 1.280002
Number of primes: 446

Sequential test
Time: 0.010000
Number of primes: 446

Multiprocessing test
Time: 1.250002
Number of primes: 446
--

Concurrent test
Time: 1.260002
Number of primes: 446

Sequential test
Time: 0.010000
Number of primes: 446

Multiprocessing test
Time: 1.250002
Number of primes: 446
--

Concurrent test
Time: 1.282003
Number of primes: 473

Sequential test
Time: 0.010000
Number of primes: 473

Multiprocessing test
Time: 1.260002
Number of primes: 473
--

我的问题是,是否可以通过在 Windows 上同时运行Python 3.6.4 |Anaconda, Inc.|来使其更快地运行呢?我在此处阅读(为什么是在Windows上创建比Linux更昂贵的新进程?)在Windows上创建新进程的代价很高.有什么事情可以做以加快速度吗?我缺少明显的东西吗?

The question that I have is whether I can make it somehow faster by running it concurrently on Windows with Python 3.6.4 |Anaconda, Inc.|. I read here on SO (Why is creating a new process more expensive on Windows than Linux?) that creating new processes on Windows is expensive. Is there anything that can be done to speed things up? Am I missing something obvious?

我也尝试只创建一次Pool,但是似乎并没有太大帮助.

I also tried to create Pool only once but it did not seem to help a lot.

原始代码结构或多或少像:

The original code structure looks more or less like:

我的代码或多或少是这样的结构:

My code is structure more or less like this:

class Foo(object):

    def g() -> int:
        # function performing simple calculations
        # single function call is fast (~500 ms)
        pass


def run(self):
    nbr_processes = multiprocessing.cpu_count() - 1

    with multiprocessing.Pool(processes=nbr_processes) as pool:
        foos = get_initial_foos()

        solution_found = False
        while not solution_found:
            # one iteration
            chunk = len(foos)//nbr_processes
            vals = list(pool.map(Foo.g, foos, chunksize=chunk))

            foos = modify_foos()

,其中foos具有1000元素.不可能事先知道算法收敛的速度和执行的迭代次数,可能是数千次.

with foos having 1000 elements. It is not possible to tell in advance how quickly algorithm converge and how many iterations are executed, possibly thousands.

推荐答案

您的设置对于多处理并不真正公平.您甚至包括不必要的primes = None分配. ;)

Your setup is not really fair to multiprocessing. You even included unnecessary primes = None assignments. ;)

一些要点:

数据大小

您生成的数据可以用来减少流程创建的开销.请尝试使用range(1_000_000)而不是range(5000).在multiprocessing.start_method设置为'spawn'的Linux上(Windows上的默认设置),这会绘制不同的图片:

Your generated data is way to litte to allow the overhead of process creation to be earned back. Try with range(1_000_000) instead of range(5000). On Linux with multiprocessing.start_method set to 'spawn' (default on Windows) this draws a different picture:

Concurrent test
Time: 0.957883
Number of primes: 89479

Sequential test
Time: 1.235785
Number of primes: 89479

Multiprocessing test
Time: 0.714775
Number of primes: 89479


重复使用您的泳池

只要您在以后要并行化的程序中保留了任何代码,就不要离开池的with-block.如果您一开始只创建一次池,那么将池创建完全纳入基准并没有多大意义.

Don't leave the with-block of the pool as long you have left any code in your program you want to parallelize later. If you create the pool only once at the beginning, it doesn't make much sense including the pool-creation into your benchmark at all.

脾气暴躁

Numpy的某些部分能够释放全局解释器锁( GIL ).这意味着,您可以从多核并行中受益,而无需创建进程.如果您仍在进行数学运算,请尝试尽可能多地使用numpy.尝试使用numpy的代码尝试concurrent.futures.ThreadPoolExecutormultiprocessing.dummy.Pool.

Numpy is in parts able to release the global interpreter lock (GIL). This means, you can benefit from multi-core parallelism without the overhead of process creation. If you're doing math anyway, try to utilize numpy as much as possible. Try concurrent.futures.ThreadPoolExecutor and multiprocessing.dummy.Pool with code using numpy.

这篇关于使用Python的Windows上的并发/并行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆