使 for 循环与 Pandas 列并行执行 [英] Make for loop execute parallely with Pandas columns

查看:110
本文介绍了使 for 循环与 Pandas 列并行执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请将下面的代码转换为并行执行,这里我试图用 Pandas 列值映射嵌套字典.下面的代码运行良好,但会消耗大量时间.因此希望并行化 for 循环(注意:df.replace(Source_Dictionary) 也完成了这项工作,但花费的时间是以下代码的三倍).

Please convert below code to execute parallel, Here I'm trying to map nested dictionary with pandas column values. The below code works perfectly but consumes lot of time. Hence looking to parallelize the for loop(Note: df.replace(Source_Dictionary) also did the job but takes triple the time of below code).

df = pd.DataFrame({'one':['bab'],'two':['abb'],'three':['bb']})
Source_Dictionary = {'one':{'dadd':1,'bab':1.5},
                    'two':{'ab':2},
                    'three':{'cc':1,'bb':3}}
required_columns = ['one','two','three']
def Feature_Map(x):
    df[x] = df[x].map(Source_Dictionary[x]).fillna(0)

for i in required_columns:
    Feature_Map(i)
print(df)
   one  two  three
0  1.5  0.0      3

推荐答案

为了加快执行速度,您可以使用多处理.进程数及其性能取决于提供的资源.假设您可以负担 4 个并行运行的进程.

To speed up your execution you can use multi processing. Number of processes and its performance depends on the resource provided. Let's suppose you can afford 4 processes to be running in parallel.

您的职能:

def Feature_Map(x):
df[x] = df[x].map(Source_Dictionary[x]).fillna(0)

多处理:

from multiprocessing.pool import ThreadPool
pool = ThreadPool(processes=4)
for i in required_columns:
    pool.apply_async(Feature_Map, (i))

您还可以实现代码等待进程执行完毕后再退出.

You can also implement code for waiting till the process has finished execution before exiting.

可以参考https://docs.python.org/2/library/multiprocessing.html 了解详细用法.

You can refer to https://docs.python.org/2/library/multiprocessing.html for detailed usage.

这篇关于使 for 循环与 Pandas 列并行执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆