Pandas Merge - 如何避免重复列 [英] Pandas Merge - How to avoid duplicating columns

查看:196
本文介绍了Pandas Merge - 如何避免重复列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试合并两个数据框.每个数据框有两个索引级别(日期,cusip).例如,在列中,某些列在两者之间匹配(货币、调整日期).

I am attempting a merge between two data frames. Each data frame has two index levels (date, cusip). In the columns, some columns match between the two (currency, adj date) for example.

按索引合并这些的最佳方法是什么,但不要采取货币和调整日期的两个副本.

What is the best way to merge these by index, but to not take two copies of currency and adj date.

每个数据框有 90 列,所以我尽量避免手工写出所有内容.

Each data frame is 90 columns, so I am trying to avoid writing everything out by hand.

df:                 currency  adj_date   data_col1 ...
date        cusip
2012-01-01  XSDP      USD      2012-01-03   0.45
...

df2:                currency  adj_date   data_col2 ...
date        cusip
2012-01-01  XSDP      USD      2012-01-03   0.45
...

如果我这样做:

dfNew = merge(df, df2, left_index=True, right_index=True, how='outer')

我明白了

dfNew:              currency_x  adj_date_x   data_col2 ... currency_y adj_date_y
date        cusip
2012-01-01  XSDP      USD      2012-01-03   0.45             USD         2012-01-03

谢谢!...

推荐答案

您可以计算出仅在一个 DataFrame 中的列,并使用它来选择合并中的列子集.

You can work out the columns that are only in one DataFrame and use this to select a subset of columns in the merge.

cols_to_use = df2.columns.difference(df.columns)

然后执行合并(注意这是一个索引对象,但它有一个方便的tolist()方法).

Then perform the merge (note this is an index object but it has a handy tolist() method).

dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')

这将避免合并中的任何列发生冲突.

This will avoid any columns clashing in the merge.

这篇关于Pandas Merge - 如何避免重复列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆