在python中并行嵌套此for循环 [英] Parallelize this nested for loop in python

本文介绍了在python中并行嵌套此for循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我再次在努力提高这段代码的执行时间.由于计算确实很耗时,我认为最好的解决方案是并行化代码.

I'm struggling again to improve the execution time of this piece of code. Since the calculations are really time-consuming I think that the best solution would be to parallelize the code.

我首先按照

I was first working with maps as explained in this question, but then I tried a more simple approach thinking that I could find a better solution. However I couldn't come up with anything yet, so since it's a different problem I decided to post it as a new question.

我正在使用Python 3.4在Windows平台上工作.

I am working on a Windows platform, using Python 3.4.

代码如下:

similarity_matrix = [[0 for x in range(word_count)] for x in range(word_count)]
for i in range(0, word_count):
    for j in range(0, word_count):
        if i > j:
            similarity = calculate_similarity(t_matrix[i], t_matrix[j])
            similarity_matrix[i][j] = similarity
            similarity_matrix[j][i] = similarity

这是calculate_similarity函数:

def calculate_similarity(array_word1, array_word2):
      denominator = sum([array_word1[i] + array_word2[i] for i in range(word_count)])
      if denominator == 0:
          return 0
      numerator = sum([2 * min(array_word1[i], array_word2[i]) for i in range(word_count)])
      return numerator / denominator

以及代码说明:

  • word_count是列表中存储的唯一单词的总数
  • t_matrix是一个矩阵,其中包含每对单词的值
  • 输出应为similarity_matrix,维度为word_count x word_count,并且每对单词都包含相似度值
  • 可以将两个矩阵都保留在内存中
  • 经过这些计算,我可以轻松找到每个单词最相似的单词(或根据任务的需要,找到前三个相似的单词)
  • calculate_similarity包含两个浮点列表,每个浮点列表用于一个单独的单词(每个单词都是t_matrix中的一行)
  • word_count is the total number of unique words stored in a list
  • t_matrix is a matrix containing a value for each pair of words
  • the output should be similarity_matrix whose dimension is word_count x word_count also containing a similarity value for each pair of words
  • it's ok to keep both matrices in memory
  • after these computations I can easily find the most similar word for each words (or the top three similar words, as the task may require)
  • calculate_similarity takes two float lists, each for a separate word (each is a row in the t_matrix)

我处理的列表包含13000个单词,如果我计算正确,系统上的执行时间将是几天.因此,任何可以在一天内完成这项工作的东西都将很棒!

I work with a list of 13k words, and if I calculated correctly the execution time on my system would be a few days. So, anything that will do the job in one day would be wonderful!

也许仅将calculate_similarity中的numeratordenominator的计算参数化会带来重大改进.

Maybe only parellelizing the calculation of numerator and denominator in calculate_similarity would make a significant improvement.

推荐答案

from concurrent.futures import ProcessPoolExecutor, Future, wait
from itertools import combinations
from functools import partial

similarity_matrix = [[0]*word_count for _ in range(word_count)]

def callback(i, j, future):
    similarity_matrix[i][j] = future.result()
    similarity_matrix[j][i] = future.result()

with ProcessPoolExecutor(max_workers=4) as executer:
    fs = []
    for i, j in combinations(range(wordcount), 2):
        future = excuter.submit(
                    calculate_similarity, 
                    t_matrix[i], 
                    t_matrix[j])

        future.add_done_callback(partial(callback, i, j))
        fs.append(future)

    wait(fs)

这篇关于在python中并行嵌套此for循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆