合并两个 pandas 数据帧会导致“重复"操作.列 [英] Merging two pandas dataframes results in "duplicate" columns

查看:75
本文介绍了合并两个 pandas 数据帧会导致“重复"操作.列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试合并两个包含相同键列的数据框.其他一些列也具有相同的标题,尽管行数不相等,并且合并后,这些列将与原始标题复制"在一起,并带有后缀_x,_y等.

I'm trying to merge two dataframes which contain the same key column. Some of the other columns also have identical headers, although not an equal number of rows, and after merging these columns are "duplicated" with the original headers given a postscript _x, _y, etc.

有人知道如何让熊猫在下面的示例中删除重复的列吗?

Does anyone know how to get pandas to drop the duplicate columns in the example below?

这是我的python代码:

This is my python code:

import pandas as pd

holding_df = pd.read_csv('holding.csv')
invest_df = pd.read_csv('invest.csv')

merge_df = pd.merge(holding_df, invest_df, on='key', how='left').fillna(0)
merge_df.to_csv('merged.csv', index=False)

并且CSV文件包含以下内容:

And the CSV files contain this:

左数据帧(holding_df)的第一行

First rows of left-dataframe (holding_df)

key, dept_name, res_name, year, need, holding
DeptA_ResA_2015, DeptA, ResA, 2015, 1, 1
DeptA_ResA_2016, DeptA, ResA, 2016, 1, 1
DeptA_ResA_2017, DeptA, ResA, 2017, 1, 1
...

右侧数据框(invest_df)

Right-dataframe (invest_df)

key, dept_name, res_name, year, no_of_inv, inv_cost_wo_ice
DeptA_ResA_2015, DeptA, ResA, 2015, 1, 1000000
DeptA_ResB_2015, DeptA, ResB, 2015, 2, 6000000
DeptB_ResB_2015, DeptB, ResB, 2015, 1, 6000000
...

合并结果

key, dept_name_x, res_name_x, year_x, need, holding, dept_name_y, res_name_y, year_y, no_of_inv, inv_cost_wo_ice
DeptA_ResA_2015, DeptA, ResA, 2015, 1, 1, DeptA, ResA, 2015.0, 1.0, 1000000.0
DeptA_ResA_2016, DeptA, ResA, 2016, 1, 1, 0, 0, 0.0, 0.0, 0.0
DeptA_ResA_2017, DeptA, ResA, 2017, 1, 1, 0, 0, 0.0, 0.0, 0.0
DeptA_ResA_2018, DeptA, ResA, 2018, 1, 1, 0, 0, 0.0, 0.0, 0.0
DeptA_ResA_2019, DeptA, ResA, 2019, 1, 1, 0, 0, 0.0, 0.0, 0.0
...

推荐答案

您添加带有后缀'_x'和'_y'的其他列的原因是因为您要合并的列没有匹配的值,因此此冲突会产生其他列.在这种情况下,您需要删除其他"_y"列并重命名"_x"列:

The reason you have additional columns with suffixes '_x' and '_y' is because the columns you are merging do not have matching values so this clash produces additional columns. In that case you need to drop the additional '_y' columns and rename the '_x' columns:

In [145]:
# define our drop function
def drop_y(df):
    # list comprehension of the cols that end with '_y'
    to_drop = [x for x in df if x.endswith('_y')]
    df.drop(to_drop, axis=1, inplace=True)

drop_y(merged)
merged
Out[145]:
               key  dept_name_x  res_name_x   year_x   need   holding  \
0  DeptA_ResA_2015        DeptA        ResA     2015      1         1   
1  DeptA_ResA_2016        DeptA        ResA     2016      1         1   
2  DeptA_ResA_2017        DeptA        ResA     2017      1         1   

    no_of_inv   inv_cost_wo_ice  
0           1           1000000  
1           0                 0  
2           0                 0  
In [146]:
# func to rename '_x' cols
def rename_x(df):
    for col in df:
        if col.endswith('_x'):
            df.rename(columns={col:col.rstrip('_x')}, inplace=True)
rename_x(merged)
merged
Out[146]:
               key  dept_name  res_name   year   need   holding   no_of_inv  \
0  DeptA_ResA_2015      DeptA      ResA   2015      1         1           1   
1  DeptA_ResA_2016      DeptA      ResA   2016      1         1           0   
2  DeptA_ResA_2017      DeptA      ResA   2017      1         1           0   

    inv_cost_wo_ice  
0           1000000  
1                 0  
2                 0 

编辑 如果您将通用列添加到合并中,那么除非这些列上的匹配项不匹配,否则不应产生重复的列:

EDIT If you added the common columns to your merge then it shouldn't produce the duplicated columns unless the matches on those columns do not match:

merge_df = pd.merge(holding_df, invest_df, on=['key', 'dept_name', 'res_name', 'year'], how='left').fillna(0)

这篇关于合并两个 pandas 数据帧会导致“重复"操作.列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆