在python中并行嵌套此for循环 [英] Parallelize this nested for loop in python

查看：578 发布时间：2020/5/13 19:53:09 python parallel-processing multiprocessing python-multithreading python-multiprocessing

本文介绍了在python中并行嵌套此for循环的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我再次在努力提高这段代码的执行时间.由于计算确实很耗时，我认为最好的解决方案是并行化代码.

I'm struggling again to improve the execution time of this piece of code. Since the calculations are really time-consuming I think that the best solution would be to parallelize the code.

我首先按照

I was first working with maps as explained in this question, but then I tried a more simple approach thinking that I could find a better solution. However I couldn't come up with anything yet, so since it's a different problem I decided to post it as a new question.

我正在使用Python 3.4在Windows平台上工作.

I am working on a Windows platform, using Python 3.4.

代码如下:

similarity_matrix = [[0 for x in range(word_count)] for x in range(word_count)]
for i in range(0, word_count):
    for j in range(0, word_count):
        if i > j:
            similarity = calculate_similarity(t_matrix[i], t_matrix[j])
            similarity_matrix[i][j] = similarity
            similarity_matrix[j][i] = similarity

这是calculate_similarity函数:

def calculate_similarity(array_word1, array_word2):
      denominator = sum([array_word1[i] + array_word2[i] for i in range(word_count)])
      if denominator == 0:
          return 0
      numerator = sum([2 * min(array_word1[i], array_word2[i]) for i in range(word_count)])
      return numerator / denominator

以及代码说明:

word_count是列表中存储的唯一单词的总数
t_matrix是一个矩阵，其中包含每对单词的值
输出应为similarity_matrix，维度为word_count x word_count，并且每对单词都包含相似度值
可以将两个矩阵都保留在内存中
经过这些计算，我可以轻松找到每个单词最相似的单词(或根据任务的需要，找到前三个相似的单词)
calculate_similarity包含两个浮点列表，每个浮点列表用于一个单独的单词(每个单词都是t_matrix中的一行)

word_count is the total number of unique words stored in a list
t_matrix is a matrix containing a value for each pair of words
the output should be similarity_matrix whose dimension is word_count x word_count also containing a similarity value for each pair of words
it's ok to keep both matrices in memory
after these computations I can easily find the most similar word for each words (or the top three similar words, as the task may require)
calculate_similarity takes two float lists, each for a separate word (each is a row in the t_matrix)

我处理的列表包含13000个单词，如果我计算正确，系统上的执行时间将是几天.因此，任何可以在一天内完成这项工作的东西都将很棒！

I work with a list of 13k words, and if I calculated correctly the execution time on my system would be a few days. So, anything that will do the job in one day would be wonderful!

也许仅将calculate_similarity中的numerator和denominator的计算参数化会带来重大改进.

Maybe only parellelizing the calculation of numerator and denominator in calculate_similarity would make a significant improvement.

在python中并行嵌套此for循环 [英] Parallelize this nested for loop in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在python中并行嵌套此for循环 [英] Parallelize this nested for loop in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭