在Windows上的Python中演示多核加速的一些示例代码是什么? [英] What is some example code for demonstrating multicore speedup in Python on Windows?

查看:133
本文介绍了在Windows上的Python中演示多核加速的一些示例代码是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Windows上使用Python 3,并试图构建一个玩具示例,演示使用多个CPU内核如何加快计算速度.玩具示例是Mandelbrot分形的渲染.

I'm using Python 3 on Windows and trying to construct a toy example, demonstrating how using multiple CPU cores can speed up computation. The toy example is rendering of the Mandelbrot fractal.

到目前为止:

  • 由于全局解释器锁定在这种情况下禁止多核,因此我避免了线程化
  • 我放弃了在Windows上无法使用的示例代码,因为它缺乏Linux的分叉功能
  • 尝试使用多处理"程序包.我声明p = Pool(8)(8是我的内核数),并使用p.starmap(..)委派工作.这应该会产生多个子进程",这些窗口将自动将Windows委派给不同的CPU

但是,无论是由于开销还是没有实际的多处理,我都无法证明任何加速.因此,具有明显加速性的玩具示例指针将非常有帮助:-)

However, I'm unable to demonstrate any speedup, whether due to overhead or no actual multiprocessing. Pointers to toy examples with demonstrable speedup would therefore be very helpful :-)

谢谢!这使我朝着正确的方向前进,现在我得到了一个可行的示例,该示例演示了4核CPU的速度提高了一倍. 我的代码副本带有讲义": https://pastebin.com/c9HZ2vAV

Thank you! This pushed me in the right direction and I've now got a working example that demonstrates a doubling of speed on a CPU with 4 cores.
A copy of my code with "lecture notes" here: https://pastebin.com/c9HZ2vAV

我决定使用Pool(),但稍后将尝试@ 16num指出的"Process"替代方法.下面是Pool()的代码示例:

I settled on using Pool() but will later try out the "Process" alternative that @16num pointed out. Below is a code example for Pool():

    p = Pool(cpu_count())

    #Unlike map, starmap only allows 1 input. "partial" provides a workaround
    partial_calculatePixel = partial(calculatePixel, dataarray=data) 
    koord = []
    for j in range(height):
        for k in range(width):
            koord.append((j,k))

    #Runs the calls to calculatePixel in a pool. "hmm" collects the output
    hmm = p.starmap(partial_calculatePixel,koord)

推荐答案

演示多处理速度非常简单:

It's very simple to demonstrate a multiprocessing speed up:

import multiprocessing
import sys
import time

# multi-platform precision clock
get_timer = time.clock if sys.platform == "win32" else time.time

def cube_function(num):
    time.sleep(0.01)  # let's simulate it takes ~10ms for the CPU core to cube the number
    return num**3

if __name__ == "__main__":  # multiprocessing guard
    # we'll test multiprocessing with pools from one to the number of CPU cores on the system
    # it won't show significant improvements after that and it will soon start going
    # downhill due to the underlying OS thread context switches
    for workers in range(1, multiprocessing.cpu_count() + 1):
        pool = multiprocessing.Pool(processes=workers)
        # lets 'warm up' our pool so it doesn't affect our measurements
        pool.map(cube_function, range(multiprocessing.cpu_count()))
        # now to the business, we'll have 10000 numbers to quart via our expensive function
        print("Cubing 10000 numbers over {} processes:".format(workers))
        timer = get_timer()  # time measuring starts now
        results = pool.map(cube_function, range(10000))  # map our range to the cube_function
        timer = get_timer() - timer  # get our delta time as soon as it finishes
        print("\tTotal: {:.2f} seconds".format(timer))
        print("\tAvg. per process: {:.2f} seconds".format(timer / workers))
        pool.close()  # lets clear out our pool for the next run
        time.sleep(1)  # waiting for a second to make sure everything is cleaned up

当然,我们这里仅是模拟10ms的数字计算,您可以将cube_function替换为任何CPU负担的实际演示.结果符合预期:

Of course, we're just simulating here 10ms-per-number calculations, you can replace cube_function with anything CPU taxing for a real-world demonstration. The results are as expected:

Cubing 10000 numbers over 1 processes:
        Total: 100.01 seconds
        Avg. per process: 100.01 seconds
Cubing 10000 numbers over 2 processes:
        Total: 50.02 seconds
        Avg. per process: 25.01 seconds
Cubing 10000 numbers over 3 processes:
        Total: 33.36 seconds
        Avg. per process: 11.12 seconds
Cubing 10000 numbers over 4 processes:
        Total: 25.00 seconds
        Avg. per process: 6.25 seconds
Cubing 10000 numbers over 5 processes:
        Total: 20.00 seconds
        Avg. per process: 4.00 seconds
Cubing 10000 numbers over 6 processes:
        Total: 16.68 seconds
        Avg. per process: 2.78 seconds
Cubing 10000 numbers over 7 processes:
        Total: 14.32 seconds
        Avg. per process: 2.05 seconds
Cubing 10000 numbers over 8 processes:
        Total: 12.52 seconds
        Avg. per process: 1.57 seconds

现在,为什么不100%线性?好吧,首先,将数据映射/分发到子流程并取回它需要一些时间,上下文切换会花费一些成本,还有其他任务会不时使用我的CPU,并非十分精确(也不是在非RT操作系统上也是如此)...但是结果大致在并行处理的预期范围之内.

Now, why not 100% linear? Well, first of all, it takes some time to map/distribute the data to the sub-processes and to get it back, there is some cost to context switching, there are other tasks that use my CPUs from time to time, time.sleep() is not exactly precise (nor it could be on a non-RT OS)... But the results are roughly in the ballpark expected for parallel processing.

这篇关于在Windows上的Python中演示多核加速的一些示例代码是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆