python parallel.futures.ProcessPoolExecutor：.submit（）与.map（）的性能 [英] python concurrent.futures.ProcessPoolExecutor: Performance of .submit() vs .map()

查看：199 发布时间：2020/10/7 20:30:18 python performance python-3.x concurrency concurrent.futures

本文介绍了python parallel.futures.ProcessPoolExecutor：.submit（）与.map（）的性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 concurrent.futures.ProcessPoolExecutor 从数字范围中查找数字的出现。目的是研究从并发获得的加速性能。为了测试性能，我有一个控件-一个执行上述任务的串行代码（如下所示）。我写了2个并发代码，一个使用 concurrent.futures.ProcessPoolExecutor.submit（），另一个使用 concurrent.futures.ProcessPoolExecutor.map（） 执行相同的任务。它们如下所示。可以在

左图显示了实验部分提到的所有方案所花费的计算时间。它表明，块数/工人数= 1 花费的计算时间始终大于块数>工人数花费的计算时间。 。也就是说，前一种情况总是不如后者有效。

右图显示当块数/工人数达到14或更大的阈值时，获得1.2倍以上。有趣的是，当使用1个工作程序执行 ProcessPoolExecutor.map（）时，也会出现加速趋势。

结论：自定义ProcessPoolExecutor.map（）的离散任务数时应该使用它来解决给定的任务，因此应谨慎选择，以确保此数字大于池工作人员的数目，因为这种做法会缩短计算时间。

并发。 futures.ProcessPoolExecutor.map（）代码。（仅限修订版）

  def _concurrent_map（nmax，number，worker，num_of_chunks）：
' 利用并发.futures.ProcessPoolExecutor.map到
的函数以并行化的
方式在一个数字范围内查找给定数字的出现。'
＃1.局部变量
 start = time（）
 chunksize = nmax // num_of_chunks 
期货= []找到的
 = [] 
＃2。以cf.ProcessPoolExecutor（max_workers = workers）作为执行者的并行化
：
＃2.1。离散化工作负载并提交给工作人员池
 cstart =（chunksize * i for range in（num_of_chunks））
 cstop =（chunksize * i if i！= num_of_chunks else nmax 
 for i in range （1，num_of_chunks + 1））
期货= executor.map（_findmatch，cstart，cstop，
 itertools.repeat（number））
＃2.2。将结果合并为列表，然后返回此列表。 
对于期货的未来：
 #print（'type（future）='，type（future））
对于f的未来：
如果f：
试试：
 found.append（f）
除外：
 print_exc（）
 foundsize = len（found）
 end = time（）-开始
 print （''n在def _concurrent（nmax，number）语句中：'）
 print（在{1：.4f} sec中找到{0}。format（foundsize，end））
如果__name__ =='__main__'，则返回找到的
 
：
 nmax = int（1E8）＃最大数字范围。 
 number = str（5）＃在数字范围内找到的数字。 
工人= 4＃工人池
 chunks_vs_workers = 14＃因子=> 14可以提供最佳性能
 num_of_chunks = chunks_vs_workers *工人
 
开始=时间（）
a = _concurrent_map（nmax，number，worker，num_of_chunks）
 end = time（）-开始
 print（'\n main'）
 print（'nmax = {}，workers = {}，num_of_chunks = {}'。format（
 nmax，workers，num_of_chunks））
 print（'workers ='，worker）
 print（ found {0 }在{1：.4f} sec .format（len（a），end））

=============================================== =======================

第2部分：总计返回排序/排序结果列表时，使用ProcessPoolExecutor子类.submit（）和.map（）计算时间可能会有所不同。

背景：我已经修改了 .submit（）和 .map（）代码，以允许苹果- to-apple比较它们的计算时间和可视化主代码的计算时间的能力，主代码调用以执行并发操作的_concurrent方法的计算时间以及每个离散任务/工作人员的计算时间由_concurrent方法调用。此外，这些代码中的并发方法构造为直接从 .submit（）的将来对象和<$ c的迭代器返回结果的无序和有序列表。 $ c> .map（）。下面提供了源代码（希望它对您有帮助。）。

实验要执行第1部分中所述的相同实验，只考虑了6个池工作器，并且将Python内置的 list 和进行了排序方法分别用于将结果的无序列表和有序列表返回到代码的主要部分。

发现：

从_concurrent方法的结果中，我们可以看到_concurrent方法的计算时间，该方法用于创建 ProcessPoolExecutor.submit（），并创建 ProcessPoolExecutor.map（）的迭代器，该迭代器是离散任务数与池工作程序数的函数，是等效的。该结果仅表示 ProcessPoolExecutor 子类 .submit（）和 .map（）是同样高效/快速的。

比较main和_concurrent方法的计算时间，我们可以看到main的运行时间比_concurrent方法更长。这是可以预期的，因为它们的时差反映了列表和排序的方法的计算时间（并且这些方法中包含的其他方法）。显然，与 sorted 方法相比， list 方法花费更少的计算时间来返回结果列表。 .submit（）和.map（）代码的 list 方法的平均计算时间相似，约为0.47秒。 .submit（）和.map（）代码的排序方法的平均计算时间分别为1.23sec和1.01sec。换句话说，对于.submit（）和 list 方法， list 方法的执行速度比 sorted 方法快2.62倍和2.15倍。 .map（）代码。

不清楚为什么 sorted 方法会从
.map（）的速度要比 .submit（）快，因为离散化的
任务的增加量大于池工作器的数目，
个离散任务的数量等于池工作人员的数量。
也就是说，这些发现表明，决定使用同样快速的 .submit（）或 .map（）子类可以通过sorted方法进行设置。例如，如果意图是在尽可能短的时间内生成一个有序列表，则与 ProcessPoolExecutor.submit（）相比，应优先使用ProcessPoolExecutor.map（）。 code> .map（）可以允许最短的总计算时间。

此处显示了我的答案的第1部分中提到的离散化方案，以加快 .submit（）的性能和 .map（）子类。当离散任务的数量等于池工作人员的数量时，加速的数量可以高达20％。

改进的.map（）代码

 ＃！/ usr / bin / python3.5 
＃-*-编码：utf-8-*-
 
并发导入。从cf 
从时间导入时间
从itertools导入重复，链
 
 
 def _findmatch（nmin，nmax，number）：
'''查找范围在nmin到nmax之间的数字的出现并返回
列表中找到的出现。'
 start = time（）
 match = [] 
对于范围n中的n（nmin，nmax）：
如果str（n）中的数字为
 match.append（n）
 end = time（）-开始
 #print（ \n def _findmatch {0：< 10} {1：< 10} {2：< 3}在{4：.4f} sec中找到了{3：8}。
＃格式（nmin，nmax，数字，len（match），end））
返回匹配
 
 def _concurrent（nmax，number，worker，num_of_chunks）：
 '''利用并发.futures.ProcessPoolExecutor.map到
的函数以并发
的方式查找给定数字在数字范围内的出现。''
＃1。局部变量
 start = time（）
 chunksize = nmax // num_of_chunks 
＃2。以cf.ProcessPoolExecutor（max_workers = workers）作为执行者的并行化
：
＃2.1。离散化工作负载并提交给工作人员池
 cstart =（chunksize * i for range in（num_of_chunks））
 cstop =（chunksize * i if i！= num_of_chunks else nmax 
 for i in range （1，num_of_chunks + 1））
 Futures = executor.map（_findmatch，cstart，cstop，repeat（number））
 end = time（）-开始
 print（'\n在def _concurrent_map（nmax，number，worker，num_of_chunks）语句中：'）
 print（在{0：.4f} sec中找到 .format（end））
返回列表（chain.from_iterable （期货））＃返回无序结果列表
＃返回已排序的（chain.from_iterable（期货））＃返回有序结果列表
 
如果__name__ =='__main__'：
 nmax = int（1E8）＃最大数字范围。 
 number = str（5）＃在数字范围内找到的数字。 
工人= 6＃工人池
 chunks_vs_workers = 30＃系数=> 14可以提供最佳性能
 num_of_chunks = chunks_vs_workers *工人
 
开始=时间（）找到的
 = _concurrent（nmax，number，worker，num_of_chunks）
 end = time（）-开始
 print（'\n main'）
 print（'nmax = {}，workers = {}，num_of_chunks = {}'。format（
 nmax，workers，num_of_chunks））
 #print（'found ='，found）
 print（ found {0} in {1：.4f} sec .format（len（found），end））

改进的.submit（）代码。

此代码与.map代码相同，只不过您将_concurrent方法替换为以下内容：

  def _concurrent（nmax，number，worker，num_of_chunks）：
'''利用并发.futures.ProcessPoolExecutor.submit到
的函数给定数字在数字范围内以并发
的方式出现。’’’
＃1.局部变量
 start = time（）
 chunksize = nmax // num_of_chunks 
 Futures = [] 
＃2。以cf.ProcessPoolExecutor（max_workers = workers）作为执行者的并行化
：
＃2.1。离散工作量并提交给范围为（num_of_chunks）的i的工作人员池
：
 cstart = chunksize * i 
 cstop = chunksize *（i + 1），如果i！= num_of_chunks-1 else nmax 
 futures.append（executor.submit（_findmatch，cstart，cstop，number））
 end = time（）-开始
 print（'n在def _concurrent_submit（nmax， number，worker，num_of_chunks）：'）
 print（在{0：.4f} sec中找到 .format（end））
返回列表（f的chain.from_iterable（f.result（））在cf.as_completed（
 Futures）））＃返回无序列表
＃返回列表（chain.from_iterable（f.result（）for cf.as_completed（
＃futures）） ）＃返回有序列表

============ ================================================== ========

I am using concurrent.futures.ProcessPoolExecutor to find the occurrence of a number from a number range. The intent is to investigate the amount of speed-up performance gained from concurrency. To benchmark performance, I have a control - a serial code to perform said task (shown below). I have written 2 concurrent codes, one using concurrent.futures.ProcessPoolExecutor.submit() and the other using concurrent.futures.ProcessPoolExecutor.map() to perform the same task. They are shown below. Advice on drafting the former and latter can be seen here and here, respectively.

The task issued to all three codes was to find the number of occurrences of the number 5 in the number range of 0 to 1E8. Both .submit() and .map() were assigned 6 workers, and .map() had a chunksize of 10,000. The manner to discretise the workload were identical in the concurrent codes. However, the function used to find occurrences in both codes were different. This was because the way arguments were passed to a function called by .submit() and .map() were different.

All 3 codes reported the same number of occurrences, i.e. 56,953,279 times. However, the time taken to complete the task were very different. .submit() performed 2 times faster than the control while .map() took twice as long as the control to complete it's task.

Questions:

I would like to know if the slow performance of .map() is an artifact of my coding or it is inherently slow?" If the former, how can I improve it. I am just surprise that it performed slower than the control as there will be no much incentive to use it.
I like to know if there is anyway to make .submit() code perform even faster. A condition I have is that the function _concurrent_submit() must return an iterable with the numbers/occurrences containing the number 5.

Benchmark Results

concurrent.futures.ProcessPoolExecutor.submit()

#!/usr/bin/python3.5
# -*- coding: utf-8 -*-

import concurrent.futures as cf
from time import time
from traceback import print_exc

def _findmatch(nmin, nmax, number):
    '''Function to find the occurrence of number in range nmin to nmax and return
       the found occurrences in a list.'''
    print('\n def _findmatch', nmin, nmax, number)
    start = time()
    match=[]
    for n in range(nmin, nmax):
        if number in str(n):
            match.append(n)
    end = time() - start
    print("found {0} in {1:.4f}sec".format(len(match),end))
    return match

def _concurrent_submit(nmax, number, workers):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.submit to
       find the occurences of a given number in a number range in a parallelised
       manner.'''
    # 1. Local variables
    start = time()
    chunk = nmax // workers
    futures = []
    found =[]
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        for i in range(workers):
            cstart = chunk * i
            cstop = chunk * (i + 1) if i != workers - 1 else nmax
            futures.append(executor.submit(_findmatch, cstart, cstop, number))
        # 2.2. Instruct workers to process results as they come, when all are
        #      completed or .....
        cf.as_completed(futures) # faster than cf.wait()
        # 2.3. Consolidate result as a list and return this list.
        for future in futures:
            for f in future.result():
                try:
                    found.append(f)
                except:
                    print_exc()
        foundsize = len(found)
        end = time() - start
        print('within statement of def _concurrent_submit():')
        print("found {0} in {1:.4f}sec".format(foundsize, end))
    return found

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.
    workers = 6     # Pool of workers

    start = time()
    a = _concurrent_submit(nmax, number, workers)
    end = time() - start
    print('\n main')
    print('workers = ', workers)
    print("found {0} in {1:.4f}sec".format(len(a),end))

concurrent.futures.ProcessPoolExecutor.map()

#!/usr/bin/python3.5
# -*- coding: utf-8 -*-

import concurrent.futures as cf
import itertools
from time import time
from traceback import print_exc

def _findmatch(listnumber, number):    
    '''Function to find the occurrence of number in another number and return
       a string value.'''
    #print('def _findmatch(listnumber, number):')
    #print('listnumber = {0} and ref = {1}'.format(listnumber, number))
    if number in str(listnumber):
        x = listnumber
        #print('x = {0}'.format(x))
        return x 

def _concurrent_map(nmax, number, workers):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
       find the occurrences of a given number in a number range in a parallelised
       manner.'''
    # 1. Local variables
    start = time()
    chunk = nmax // workers
    futures = []
    found =[]
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        for i in range(workers):
            cstart = chunk * i
            cstop = chunk * (i + 1) if i != workers - 1 else nmax
            numberlist = range(cstart, cstop)
            futures.append(executor.map(_findmatch, numberlist,
                                        itertools.repeat(number),
                                        chunksize=10000))
        # 2.3. Consolidate result as a list and return this list.
        for future in futures:
            for f in future:
                if f:
                    try:
                        found.append(f)
                    except:
                        print_exc()
        foundsize = len(found)
        end = time() - start
        print('within statement of def _concurrent(nmax, number):')
        print("found {0} in {1:.4f}sec".format(foundsize, end))
    return found

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.
    workers = 6     # Pool of workers

    start = time()
    a = _concurrent_map(nmax, number, workers)
    end = time() - start
    print('\n main')
    print('workers = ', workers)
    print("found {0} in {1:.4f}sec".format(len(a),end))

Serial Code:

#!/usr/bin/python3.5
# -*- coding: utf-8 -*-

from time import time

def _serial(nmax, number):    
    start = time()
    match=[]
    nlist = range(nmax)
    for n in nlist:
        if number in str(n):match.append(n)
    end=time()-start
    print("found {0} in {1:.4f}sec".format(len(match),end))
    return match

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.

    start = time()
    a = _serial(nmax, number)
    end = time() - start
    print('\n main')
    print("found {0} in {1:.4f}sec".format(len(a),end))

Update 13th Feb 2017:

In addition to @niemmi answer, I have provide an answer following some personal research to show:

how to further speed-up @niemmi's .map() and .submit() solutions, and
when ProcessPoolExecutor.map() can led to more speed-up than ProcessPoolExecutor.submit().

解决方案

Overview:

There are 2 parts to my answer:

Part 1 shows how to gain more speed-up from @niemmi's ProcessPoolExecutor.map() solution.
Part 2 shows when the ProcessPoolExecutor's subclasses .submit() and .map() yield non-equivalent compute times.

=======================================================================

Part 1: More Speed-up for ProcessPoolExecutor.map()

Background: This section builds on @niemmi's .map() solution, which by itself is excellent. While doing some research on his discretization scheme to better understand how that interact with .map() chunksizes arguement, I found this interesting solution.

I regard @niemmi's definition of chunk = nmax // workers to be a definition for chunksize, i.e. a smaller size of actual number range (given task) to be tackled by each worker in the worker pool. Now, this definition is premised on the assumption that if a computer has x number of workers, dividing the task equally among each worker will result in optimum use of each worker and hence the total task will be completed fastest. Therefore, the number of chunks to break up a given task into should always equal the number of pool workers. However, is this assumption correct?

Proposition: Here, I propose that the above assumption does not always lead to the fastest compute time when used with ProcessPoolExecutor.map(). Rather, discretising a task to an amount greater than the number of pool workers can lead to speed-up, i.e. faster completion of a given task.

Experiment: I have modified @niemmi's code to allow the number of discretized tasks to exceed the number of pool workers. This code is given below and used to fin the number of times the number 5 appears in the number range of 0 to 1E8. I have executed this code using 1, 2, 4, and 6 pool workers and for various ratio of number of discretized tasks vs the number of pool workers. For each scenario, 3 runs were made and the compute times were tabulated. "Speed-up" is defined here as the average compute time using equal number of chunks and pool workers over the average compute time of when the number of discretized tasks is greater than the number of pool workers.

Findings:

Figure on left shows the compute time taken by all the scenarios mentioned in the experiment section. It shows that the compute time taken by number of chunks / number of workers = 1 is always greater than the compute time taken by number of chunks > number of workers. That is, the former case is always less efficient than the latter.
Figure on right shows that a speed-up of 1.2 times or more was gained when the number of chunks / number of workers reach a threshold value of 14 or more. It is interesting to observe that the speed-up trend also occurred when ProcessPoolExecutor.map() was executed with 1 worker.

Conclusion: When customizing the number of discrete tasks that ProcessPoolExecutor.map()` should use to solve a given task, it is prudent to ensure that this number is greater than the number pool workers as this practice shortens compute time.

concurrent.futures.ProcessPoolExecutor.map() code. (revised parts only)

def _concurrent_map(nmax, number, workers, num_of_chunks):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
       find the occurrences of a given number in a number range in a parallelised
       manner.'''
    # 1. Local variables
    start = time()
    chunksize = nmax // num_of_chunks
    futures = []
    found =[]
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        cstart = (chunksize * i for i in range(num_of_chunks))
        cstop = (chunksize * i if i != num_of_chunks else nmax
                 for i in range(1, num_of_chunks + 1))
        futures = executor.map(_findmatch, cstart, cstop,
                               itertools.repeat(number))
        # 2.2. Consolidate result as a list and return this list.
        for future in futures:
            #print('type(future)=',type(future))
            for f in future:
                if f:
                    try:
                        found.append(f)
                    except:
                        print_exc()
        foundsize = len(found)
        end = time() - start
        print('\n within statement of def _concurrent(nmax, number):')
        print("found {0} in {1:.4f}sec".format(foundsize, end))
    return found

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.
    workers = 4     # Pool of workers
    chunks_vs_workers = 14 # A factor of =>14 can provide optimum performance  
    num_of_chunks = chunks_vs_workers * workers

    start = time()
    a = _concurrent_map(nmax, number, workers, num_of_chunks)
    end = time() - start
    print('\n main')
    print('nmax={}, workers={}, num_of_chunks={}'.format(
          nmax, workers, num_of_chunks))
    print('workers = ', workers)
    print("found {0} in {1:.4f}sec".format(len(a),end))

=======================================================================

Part 2: Total compute time from using ProcessPoolExecutor subclasses .submit() and .map() can be dissimilar when returning a sorted/ordered result list.

Background: I have amended both the .submit() and .map() codes to allow an "apple-to-apple" comparison of their compute time and the ability to visualize the compute time of the main code, the compute time of the _concurrent method called by the main code to performs the concurrent operations, and the compute time for each discretized task/worker called by the _concurrent method. Furthermore, the concurrent method in these codes was structured to return an unordered and ordered list of the result directly from the future object of .submit() and the iterator of .map(). Source code is provided below (Hope it helps you.).

Experiments These two newly improved codes were used to perform the same experiment described in Part 1, save that only 6 pool workers were considered and the python built-in list and sorted methods were used to return an unordered and ordered list of the results to the main section of the code, respectively.

Findings:

From the _concurrent method's result, we can see the compute times of the _concurrent method used to create all the Future objects of ProcessPoolExecutor.submit(), and to create the iterator of ProcessPoolExecutor.map(), as a function of the number of discretized task over the number of pool workers, are equivalent. This result simply means that the ProcessPoolExecutor sub-classes .submit() and .map() are equally efficient/fast.
Comparing the compute times from main and it's _concurrent method, we can see that main ran longer than it's _concurrent method. This is to be expected as their time difference reflects the amount of compute times of the list and sorted methods (and that of the other methods encased within these methods). Clearly seen, the list method took less compute time to return a result list than the sorted method. The average compute times of the list method for both the .submit() and .map() codes were similar, at ~0.47sec. The average compute time of the sorted method for the .submit() and .map() codes was 1.23sec and 1.01sec, respectively. In other words, the list method performed 2.62 times and 2.15 times faster than sorted method for the .submit() and .map() codes, respectively.
It is not clear why the sorted method generated an ordered list from .map() faster than from .submit(), as the number of discretized tasks increased more than the number of pool workers, save when the number of discretized tasks equaled the number of pool workers. That said, these findings shows that the decision to use the equally fast .submit() or .map() sub-classes can be encumbered by the sorted method. For example, if the intent is to generate an ordered list in the shortest time possible, the use of ProcessPoolExecutor.map() should be preferred over ProcessPoolExecutor.submit() as .map() can allow the shortest total compute time.
The discretization scheme mentioned in Part 1 of my answer is shown here to speed-up the performance of both the .submit() and .map() sub-classes. The amount of speed-up can be as much as 20% over the case when the number of discretized tasks equaled the number of pool workers.

Improved .map() code

#!/usr/bin/python3.5
# -*- coding: utf-8 -*-

import concurrent.futures as cf
from time import time
from itertools import repeat, chain 


def _findmatch(nmin, nmax, number):
    '''Function to find the occurence of number in range nmin to nmax and return
       the found occurences in a list.'''
    start = time()
    match=[]
    for n in range(nmin, nmax):
        if number in str(n):
            match.append(n)
    end = time() - start
    #print("\n def _findmatch {0:<10} {1:<10} {2:<3} found {3:8} in {4:.4f}sec".
    #      format(nmin, nmax, number, len(match),end))
    return match

def _concurrent(nmax, number, workers, num_of_chunks):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
       find the occurrences of a given number in a number range in a concurrent
       manner.'''
    # 1. Local variables
    start = time()
    chunksize = nmax // num_of_chunks
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        cstart = (chunksize * i for i in range(num_of_chunks))
        cstop = (chunksize * i if i != num_of_chunks else nmax
                 for i in range(1, num_of_chunks + 1))
        futures = executor.map(_findmatch, cstart, cstop, repeat(number))
    end = time() - start
    print('\n within statement of def _concurrent_map(nmax, number, workers, num_of_chunks):')
    print("found in {0:.4f}sec".format(end))
    return list(chain.from_iterable(futures)) #Return an unordered result list
    #return sorted(chain.from_iterable(futures)) #Return an ordered result list

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.
    workers = 6     # Pool of workers
    chunks_vs_workers = 30 # A factor of =>14 can provide optimum performance 
    num_of_chunks = chunks_vs_workers * workers

    start = time()
    found = _concurrent(nmax, number, workers, num_of_chunks)
    end = time() - start
    print('\n main')
    print('nmax={}, workers={}, num_of_chunks={}'.format(
          nmax, workers, num_of_chunks))
    #print('found = ', found)
    print("found {0} in {1:.4f}sec".format(len(found),end))

Improved .submit() code.
This code is same as .map code except you replace the _concurrent method with the following:

def _concurrent(nmax, number, workers, num_of_chunks):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.submit to
       find the occurrences of a given number in a number range in a concurrent
       manner.'''
    # 1. Local variables
    start = time()
    chunksize = nmax // num_of_chunks
    futures = []
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        for i in range(num_of_chunks):
            cstart = chunksize * i
            cstop = chunksize * (i + 1) if i != num_of_chunks - 1 else nmax
            futures.append(executor.submit(_findmatch, cstart, cstop, number))
    end = time() - start
    print('\n within statement of def _concurrent_submit(nmax, number, workers, num_of_chunks):')
    print("found in {0:.4f}sec".format(end))
    return list(chain.from_iterable(f.result() for f in cf.as_completed(
        futures))) #Return an unordered list
    #return list(chain.from_iterable(f.result() for f in cf.as_completed(
    #    futures))) #Return an ordered list

=======================================================================

这篇关于python parallel.futures.ProcessPoolExecutor：.submit（）与.map（）的性能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python parallel.futures.ProcessPoolExecutor：.submit（）与.map（）的性能 [英] python concurrent.futures.ProcessPoolExecutor: Performance of .submit() vs .map()

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python parallel.futures.ProcessPoolExecutor：.submit（）与.map（）的性能 [英] python concurrent.futures.ProcessPoolExecutor: Performance of .submit() vs .map()

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭