Pandas:合并多个数据框和控制列名称? [英] Pandas: merge multiple dataframes and control column names?

查看:47
本文介绍了Pandas:合并多个数据框和控制列名称?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将九个 Pandas 数据帧合并到一个数据帧中,对两列进行连接,控制列名.这可能吗?

I would like to merge nine Pandas dataframes together into a single dataframe, doing a join on two columns, controlling the column names. Is this possible?

我有九个数据集.它们都有以下列:

I have nine datasets. All of them have the following columns:

org, name, items,spend

我想将它们加入一个包含以下列的数据框:

I want to join them into a single dataframe with the following columns:

org, name, items_df1, spend_df1, items_df2, spend_df2, items_df3...

我一直在阅读有关合并和加入的文档.我目前可以像这样将两个数据集合并在一起:

I've been reading the documentation on merging and joining. I can currently merge two datasets together like this:

ad = pd.DataFrame.merge(df_presents, df_trees,
                        on=['practice', 'name'],
                        suffixes=['_presents', '_trees'])

这很好用,做 print list(aggregate_data.columns.values) 会显示以下列:

This works great, doing print list(aggregate_data.columns.values) shows me the following columns:

[org', u'name', u'spend_presents', u'items_presents', u'spend_trees', u'items_trees'...]

但是我如何为九列做到这一点?merge 似乎一次只接受两个,如果我按顺序进行,我的列名最终会变得非常混乱.

But how can I do this for nine columns? merge only seems to accept two at a time, and if I do it sequentially, my column names are going to end up very messy.

推荐答案

你可以使用 functools.reduce 迭代地将 pd.merge 应用到每个 DataFrame:

You could use functools.reduce to iteratively apply pd.merge to each of the DataFrames:

result = functools.reduce(merge, dfs)

这相当于

result = dfs[0]
for df in dfs[1:]:
    result = merge(result, df)

要传递 on=['org', 'name'] 参数,您可以使用 functools.partial 定义合并函数:

To pass the on=['org', 'name'] argument, you could use functools.partial define the merge function:

merge = functools.partial(pd.merge, on=['org', 'name'])

由于在 functools.partial 中指定了 suffixes 参数将只允许一个固定的后缀选择,因为在这里我们需要为每个后缀使用不同的后缀pd.merge 调用,我认为最简单的方法是准备 DataFrames 列调用 pd.merge 之前的名称:

Since specifying the suffixes parameter in functools.partial would only allow one fixed choice of suffix, and since here we need a different suffix for each pd.merge call, I think it would be easiest to prepare the DataFrames column names before calling pd.merge:

for i, df in enumerate(dfs, start=1):
    df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')}, 
              inplace=True)

<小时>

例如

import pandas as pd
import numpy as np
import functools
np.random.seed(2015)

N = 50
dfs = [pd.DataFrame(np.random.randint(5, size=(N,4)), 
                    columns=['org', 'name', 'items', 'spend']) for i in range(9)]
for i, df in enumerate(dfs, start=1):
    df.rename(columns={col:'{}_df{}'.format(col, i) for col in ('items', 'spend')}, 
              inplace=True)
merge = functools.partial(pd.merge, on=['org', 'name'])
result = functools.reduce(merge, dfs)
print(result.head())

收益

   org  name  items_df1  spend_df1  items_df2  spend_df2  items_df3  \
0    2     4          4          2          3          0          1   
1    2     4          4          2          3          0          1   
2    2     4          4          2          3          0          1   
3    2     4          4          2          3          0          1   
4    2     4          4          2          3          0          1   

   spend_df3  items_df4  spend_df4  items_df5  spend_df5  items_df6  \
0          3          1          0          1          0          4   
1          3          1          0          1          0          4   
2          3          1          0          1          0          4   
3          3          1          0          1          0          4   
4          3          1          0          1          0          4   

   spend_df6  items_df7  spend_df7  items_df8  spend_df8  items_df9  spend_df9  
0          3          4          1          3          0          1          2  
1          3          4          1          3          0          0          3  
2          3          4          1          3          0          0          0  
3          3          3          1          3          0          1          2  
4          3          3          1          3          0          0          3  

这篇关于Pandas:合并多个数据框和控制列名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆