如何使用并发将数据帧追加到空数据帧 [英] How to append dataframe to an empty dataframe using concurrent

查看:128
本文介绍了如何使用并发将数据帧追加到空数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Python中使用 concurrent 运行一个函数。这就是我所拥有的功能:

I want to run a function using concurrent in Python. This is the function that I have :

import concurrent.futures
import pandas as pd
import time

def putIndf(file):
    listSel = getline(file)
    datFram = savetoDataFrame(listSel)
    return datFram #datatype : dataframe

def main():
    newData = pd.DataFrame()
    with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
        for i,file in zip(fileList, executor.map(dp.putIndf, fileList)):
            df = newData.append(file, ignore_index=True)
    return df

if __name__ == '__main__':
    main()

我想将数据框加入为一个数据框 newData ,但结果只是该函数的最后一个数据帧

I want to join dataframe to be one dataframe newData, but the result is only the last dataframe from that function

推荐答案

基本上,每次迭代都分配 df ,并且永远不要增长它。您可能的意思(建议)是初始化一个空的 df 并迭代附加:

Essentially you are re-assigning df with each iteration and never growing it. What you probably meant (ill-advised) is to initialize an empty df and append iteratively:

df = pd.DataFrame()
...
df = df.append(file, ignore_index=True)

不过,首选方法是构建要在循环外一次附加到一起的数据帧的集合,并避免在循环内增加任何复杂的对象,例如数据帧。

Nonetheless, the preferred method is to build a collection of data frames to be appended all together once outside a loop and avoid growing any complex objects like data frames inside loop.

def main():
    with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
        # LIST COMPREHENSION
        df_list = [file for i,file in zip(fileList, executor.map(dp.putIndf, fileList))]

        # DICTIONARY COMPREHENSION
        # df_dict = {i:file for i,file in zip(fileList, executor.map(dp.putIndf, fileList))}

    df = pd.concat(df_list, ignore_index=True)
    return df

或者由于您的合并过程,附加数据fr ames到一个列表,仍然在循环外串联一次:

Alternatively due to your pool process, append data frames to a list, still concatenating once outside the loop:

def main():
    df_list = []      # df_dict = {}
    with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
        for i,file in zip(fileList, executor.map(dp.putIndf, fileList)):
            df_list.append(file)
            # df_dict[i] = file

    df = pd.concat(df_list, ignore_index=True)
    return df

这篇关于如何使用并发将数据帧追加到空数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆