pandas python中的并行处理 [英] parallel processing in pandas python

查看：96 发布时间：2020/5/24 2:53:38 python pandas

本文介绍了 pandas python中的并行处理的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的数据框中有5,000,000行.在我的代码中，我正在使用iterrows()，这花费了太多时间.为了获得所需的输出，我必须遍历所有行.所以我想知道我是否可以并行化pandas中的代码.

I have 5,000,000 rows in my dataframe. In my code, I am using iterrows() which is taking too much time. To get the required output, I have to iterate through all the rows . So I wanted to know whether I can parallelize the code in pandas.

推荐答案

以下是我发现可能会有所帮助的网页: http://gouthamanbalaraman.com/blog/distributed-processing-pandas.html

Here's a webpage I found that might help: http://gouthamanbalaraman.com/blog/distributed-processing-pandas.html

这是在该页面中找到的用于多处理的代码:

And here's the code for multiprocessing found in that page:

import pandas as pd
import multiprocessing as mp

LARGE_FILE = "D:\\my_large_file.txt"
CHUNKSIZE = 100000 # processing 100,000 rows at a time

def process_frame(df):
    # process data frame
    return len(df)

if __name__ == '__main__':
    reader = pd.read_table(LARGE_FILE, chunksize=CHUNKSIZE)
    pool = mp.Pool(4) # use 4 processes

    funclist = []
    for df in reader:
        # process each data frame
        f = pool.apply_async(process_frame,[df])
        funclist.append(f)

    result = 0
    for f in funclist:
        result += f.get(timeout=10) # timeout in 10 seconds

    print "There are %d rows of data"%(result)

这篇关于 pandas python中的并行处理的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas python中的并行处理 [英] parallel processing in pandas python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas python中的并行处理 [英] parallel processing in pandas python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭