在Pandas数据帧上并行化操作时速度较慢 [英] Slow speed while parallelizing operation on pandas dataframe

查看：75 发布时间：2020/5/4 4:58:02 python loops pandas

本文介绍了在Pandas数据帧上并行化操作时速度较慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框，可以对其执行一些操作并打印出来.为此，我必须遍历每一行.

I have a dataframe which I perform some operation on and print out. To do this, I have to iterate through each row.

for count, row in final_df.iterrows():
    x = row['param_a']
    y = row['param_b']
    # Perform operation
    # Write to output file

我决定使用python多处理模块对此进行并行化

I decided to parallelize this using the python multiprocessing module

def write_site_files(row):
    x = row['param_a']
    y = row['param_b']
    # Perform operation
    # Write to output file

pkg_num = 0
total_runs = final_df.shape[0] # Total number of rows in final_df
threads = []

import multiprocessing

while pkg_num < total_runs or len(threads):
    if(len(threads) < num_proc and pkg_num < total_runs):
        print pkg_num, total_runs
        t = multiprocessing.Process(target=write_site_files,args=[final_df.iloc[pkg_num],pkg_num])
        pkg_num = pkg_num + 1
        t.start()
        threads.append(t)
    else:
        for thread in threads:
            if not thread.is_alive():
               threads.remove(thread)

但是，后一种(并行化)方法比基于简单迭代的方法要慢得多.有什么我想念的吗?

However, the latter (parallelized) method is way slower than the simple iteration based approach. Is there anything I am missing?

谢谢！

在Pandas数据帧上并行化操作时速度较慢 [英] Slow speed while parallelizing operation on pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Pandas数据帧上并行化操作时速度较慢 [英] Slow speed while parallelizing operation on pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭