Pandas Merge - 如何避免重复列 [英] Pandas Merge - How to avoid duplicating columns
问题描述
我正在尝试合并两个数据框.每个数据框有两个索引级别(日期,cusip).例如,在列中,某些列在两者之间匹配(货币、调整日期).
I am attempting a merge between two data frames. Each data frame has two index levels (date, cusip). In the columns, some columns match between the two (currency, adj date) for example.
按索引合并这些的最佳方法是什么,但不要采取货币和调整日期的两个副本.
What is the best way to merge these by index, but to not take two copies of currency and adj date.
每个数据框有 90 列,所以我尽量避免手工写出所有内容.
Each data frame is 90 columns, so I am trying to avoid writing everything out by hand.
df: currency adj_date data_col1 ...
date cusip
2012-01-01 XSDP USD 2012-01-03 0.45
...
df2: currency adj_date data_col2 ...
date cusip
2012-01-01 XSDP USD 2012-01-03 0.45
...
如果我这样做:
dfNew = merge(df, df2, left_index=True, right_index=True, how='outer')
我明白了
dfNew: currency_x adj_date_x data_col2 ... currency_y adj_date_y
date cusip
2012-01-01 XSDP USD 2012-01-03 0.45 USD 2012-01-03
谢谢!...
推荐答案
您可以计算出仅在一个 DataFrame 中的列,并使用它来选择合并中的列子集.
You can work out the columns that are only in one DataFrame and use this to select a subset of columns in the merge.
cols_to_use = df2.columns.difference(df.columns)
然后执行合并(注意这是一个索引对象,但它有一个方便的tolist()
方法).
Then perform the merge (note this is an index object but it has a handy tolist()
method).
dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')
这将避免合并中的任何列发生冲突.
This will avoid any columns clashing in the merge.
这篇关于Pandas Merge - 如何避免重复列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!