带有多重处理的数值模拟比预期的要慢得多:我做错了什么吗?我可以加快速度吗? [英] Numerical simulations with multiprocessing much slower than hoped: am I doing anything wrong? Can I speed it up?

查看:82
本文介绍了带有多重处理的数值模拟比预期的要慢得多:我做错了什么吗?我可以加快速度吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一组数值模拟.我需要对结果进行一些敏感性分析,即计算并显示随着某些输入在给定范围内变化,某些输出会发生多少变化.基本上,我需要创建一个像这样的表,其中每一行都是一个模型运行的结果:

I am running set of numerical simulations. I need to run some sensitivity analyses on the results, i.e. calculate and show how much certain outputs change, as certain inputs vary within given ranges. Basically I need to create a table like this, where each row is the result of one model run:

+-------------+-------------+-------------+-------------+
|   Input 1   |   Input 2   |  Output 1   |  Output 2   |
+-------------+-------------+-------------+-------------+
| 0.708788979 | 0.614576315 | 0.366315092 | 0.476088865 |
| 0.793662551 | 0.938622754 | 0.898870204 | 0.014915374 |
| 0.366560694 | 0.244354275 | 0.740988568 | 0.197036087 |
+-------------+-------------+-------------+-------------+

每个模型运行都很难并行化,但是通过让每个CPU在不同的输入下运行不同的模型来并行化并不难.

Each model run is tricky to parallelise, but it shouldn't be too hard to parallelise by getting each CPU to run a different model with different inputs.

我已经将某些东西与多处理库放在一起,但是它比我期望的要慢得多.您对我做错了什么/如何加快速度有什么建议吗?我愿意使用多元处理以外的库.

I have put something together with the multiprocessing library, but it is much slower than I would have hoped. Do you have any suggestions on what I am doing wrong / how I can speed it up? I am open to using a library other than multiprocessing.

与负载平衡有关吗? 我必须承认,我是Python多重处理的新手,对map,apply和apply_async之间的区别还不太清楚.

Does it have to do with load balancing? I must confess I am new to multiprocessing in Python and am not too clear on the differences among map, apply, and apply_async.

我做了一个玩具例子来说明我的意思:我从对数正态分布中创建随机样本,并计算样本均值随分布均值和sigma的变化量.这只是一个平庸的例子,因为这里重要的不是模型本身,而是并行运行多个模型.

I have made a toy example to show what I mean: I create random samples from a lognormal distribution, and calculate how much the mean of my sample changes as the mean and sigma of the distribution change. This is just a banal example because what matters here is not the model itself, but running multiple models in parallel.

在我的示例中,时间(以秒为单位)为:

In my example, the times (in seconds) are:

+-----------------+-----------------+---------------------+
| Million records | Time (parallel) | Time (not parallel) |
+-----------------+-----------------+---------------------+
|               5 | 24.4            | 18                  |
|              10 | 26.5            | 35.8                |
|              20 | 32.2            | 71                  |
+-----------------+-----------------+---------------------+

并行化只有5到1千万的样本量才能带来任何好处.这是意料之中的吗?

P.S.我知道 SALib 库用于敏感性分析,但是据我所知,它并不能满足我的需求.

P.S. I am aware of the SALib library for sensitivity analyses, but, as far as I can see, it doesn't do what I'm after.

我的代码:

import numpy as np
import pandas as pd
import time
import multiprocessing
from multiprocessing import Pool

# I store all the possible inputs in a dataframe
tmp = {}
i = 0
for mysigma in np.linspace(0,1,10):
    for mymean in np.linspace(0,1,10):
        i += 1
        tmp[i] = pd.DataFrame({'mean':[mymean],\
           'sigma':[mysigma]})
par_inputs = pd.concat( [tmp[x] for x in tmp], axis=0, ignore_index=True)      


def not_parallel(df):
    for row in df.itertuples(index=True):
        myindex = row[0]
        mymean = row[1]
        mysigma = row[2]
        dist = np.random.lognormal(mymean, mysigma, size = n)
        empmean = dist.mean()
        df.loc[myindex,'empirical mean'] = empmean

    df.to_csv('results not parallel.csv')

# splits the dataframe and sets up the parallelisation
def parallelize_dataframe(df, func):
    df_split = np.array_split(df, num_partitions)
    pool = Pool(num_cores)
    conc_df = pd.concat(pool.map(func, df_split))
    pool.close()
    pool.join()
    conc_df.to_csv('results parallelized.csv')
    return conc_df

# the actual function being parallelised
def parallel_sensitivities(data):   
    for row in data.itertuples(index=True):
        myindex = row[0]
        mymean = row[1]
        mysigma = row[2]
        dist = np.random.lognormal(mymean, mysigma, size = n)
        empmean = dist.mean()
        print(empmean)
        data.loc[myindex,'empirical mean'] = empmean
    return data


num_cores = multiprocessing.cpu_count()
num_partitions = num_cores
n = int(5e6)

if __name__ == '__main__':

    start = time.time()
    not_parallel(par_inputs)
    time_np = time.time() - start

    start = time.time()
    parallelize_dataframe(par_inputs, parallel_sensitivities)
    time_p = time.time() - start

推荐答案

时间差是用于启动多个进程的.要开始每个过程,需要花费一些时间.您的实际处理时间比非并行处理要好得多,但是多处理速度提高的一部分是接受开始每个进程所花费的时间.

The time differences are for starting the multiple processes up. To start each process it takes some amount of seconds. Actual processing time you are doing much better than non-parallel but part of multiprocessing speed increase is accepting the time it takes to start each process.

在这种情况下,您的示例函数相对快速(以秒为单位),因此您不会在少量记录上立即看到时间的增加.对于每条记录的更密集的操作,通过并行化,您将看到更多的时间收益.

In this case, your example functions are relatively fast by amount of seconds so you don't see the time gain immediately on a small amount of records. For more intensive operations on each record you would see much more significant time gains by parallelizing.

请记住,由于操作系统需要子进程的开销,因此并行化既昂贵又费时.与以线性方式运行两个或多个任务相比,并行执行此操作可以节省每个子流程25%到30%的时间,具体取决于您的用例.例如,如果串行执行两个耗时5秒的任务,则总共总共需要10秒,而并行执行时,在多核计算机上平均可能需要大约8秒.这8秒中的3秒可能会浪费在开销上,从而限制了您提高速度.

Keep in mind that parallelization is both costly, and time-consuming due to the overhead of the subprocesses that is needed by your operating system. Compared to running two or more tasks in a linear way, doing this in parallel you may save between 25 and 30 percent of time per subprocess, depending on your use-case. For example, two tasks that consume 5 seconds each need 10 seconds in total if executed in series, and may need about 8 seconds on average on a multi-core machine when parallelized. 3 of those 8 seconds may be lost to overhead, limiting your speed improvements.

来自本文.

使用Pool()时,有一些选项可将任务分配给池.

When using a Pool(), you have a few options to assign tasks to the pool.

multiprocessing.apply_asynch() docs 是用于分配单个任务,以避免在等待任务完成时阻塞.

multiprocessing.apply_asynch() docs is used to assign a single task and in order to avoid blocking while waiting for that task completion.

multiprocessing.map_async 文档chunk_size进行迭代的一个大块,并将每个大块添加到要完成的池中.

multiprocessing.map_async docs will chunk an iterable by chunk_size and add each chunk to the pool to be completed.

在您的情况下,这取决于您使用的实际方案,但是它们不能根据时间交换,而是根据需要运行的功能来交换.我不会肯定地说出您需要哪一个,因为您使用了一个伪造的例子.我猜如果您需要运行每个功能并且该功能是独立的,则可以使用apply_asynch.如果该函数可以并行运行于一个迭代器上,则您需要map_asynch.

In your case, it will depend on the real scenario you are using, but they aren't exchangeable based on time, rather based on what function you need to run. I'm not going to say for sure which one you need since you used a fake example. I'm guessing you could use apply_asynch if you need each function to run and the function is self-contained. If the function can parallel run over an iterable, you would want to map_asynch.

这篇关于带有多重处理的数值模拟比预期的要慢得多:我做错了什么吗?我可以加快速度吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆