在python上并行执行和文件写入 [英] parallel excution and file writing on python

查看:184
本文介绍了在python上并行执行和文件写入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个分布在10个大群集中的非常大的数据集,任务是对每个群集进行一些计算,并将结果逐行写入(附加)到10个文件中,其中每个文件都包含与每个文件对应的结果在10个集群中,每个集群可以独立计算,我想将代码并行化为10个CPU(或线程),以便可以一次对所有集群进行计算,以下是我的任务的简化伪代码:

I have a very large datasets distributed in 10 big clusters and the task is to do some computations for each cluster and write (append) the results line by line into 10 files where each file contains the results obtained corresponding to each one of the 10 clusters, each cluster can be computed independently, and I want to parallelize the code into ten CPUs (or threads) such that I can do the computations on all the clusters at once, a simplified pseudo code for my task is the following:

for(c in range (1,10)):  #this is the loop over the clusters
    for(l in "readlines from cluster C")
         # do some computations for line l in cluster c
         # append the results in file named "cluster_c" one file for each cluter c

推荐答案

#!/usr/bin/env python
from multiprocessing import Pool

def compute_cluster(c):
    """each cluster can be computed independently"""
    ... # compute a cluster here 

if __name__=="__main__":
   pool = Pool(10) # run 10 task at most in parallel
   pool.map(compute_cluster, range(10))

这篇关于在python上并行执行和文件写入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆