根据Python目录中所有Excel文件的多列合并 [英] Merge based on multiple columns of all excel files from a directory in Python

查看:1030
本文介绍了根据Python目录中所有Excel文件的多列合并的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个数据框df和一个目录./,其中包含以下excel文件:

Say I have a dataframe df, and a directory ./ which has the following excel files inside:

path = './'
for root, dirs, files in os.walk(path):
    for file in files:
        if file.endswith(('.xls', '.xlsx')):
            print(os.path.join(root, file))
            # dfs.append(read_dfs(os.path.join(root, file)))
# df = reduce(lambda left, right: pd.concat([left, right], axis = 0), dfs)

出局:

df1.xlsx,
df2.xlsx,
df3.xls
...

我想基于公用列datecitydfpath中的所有文件合并.它可以与以下代码一起使用,但是不够简洁.

I want to merge df with all files from path based on common columns date and city. It works with the following code, but it's not concise enough.

所以我提出了一个改进代码的问题,谢谢.

So I raise a question for improving the code, thank you.

df = pd.merge(df, df1, on = ['date', 'city'], how='left')
df = pd.merge(df, df2, on = ['date', 'city'], how='left')
df = pd.merge(df, df3, on = ['date', 'city'], how='left')
...

参考:

pandas三向联接列上的多个数据框

推荐答案

以下代码可能有效:

from functools import reduce

dfs = [df0, df1, df2, dfN]
df_final = reduce(lambda left, right: pd.merge(left, right, on=['date', 'city']), dfs)

这篇关于根据Python目录中所有Excel文件的多列合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆