pandas 合并-如何避免重复的列 [英] Pandas Merge - How to avoid duplicating columns

查看:89
本文介绍了 pandas 合并-如何避免重复的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在两个数据帧之间合并.每个数据帧都有两个索引级别(日期,客户).在列中,例如,某些列在两者之间匹配(货币,日期).

I am attempting a merge between two data frames. Each data frame has two index levels (date, cusip). In the columns, some columns match between the two (currency, adj date) for example.

按索引合并这些内容的最佳方法是什么,但不要同时获取两个副本的货币和日期.

What is the best way to merge these by index, but to not take two copies of currency and adj date.

每个数据框都是90列,所以我试图避免用手将所有内容写出来.

Each data frame is 90 columns, so I am trying to avoid writing everything out by hand.

df:                 currency  adj_date   data_col1 ...
date        cusip
2012-01-01  XSDP      USD      2012-01-03   0.45
...

df2:                currency  adj_date   data_col2 ...
date        cusip
2012-01-01  XSDP      USD      2012-01-03   0.45
...

如果我这样做:

dfNew = merge(df, df2, left_index=True, right_index=True, how='outer')

我知道

dfNew:              currency_x  adj_date_x   data_col2 ... currency_y adj_date_y
date        cusip
2012-01-01  XSDP      USD      2012-01-03   0.45             USD         2012-01-03

谢谢! ...

Thank you! ...

推荐答案

您可以算出仅在一个DataFrame中的列,并使用它来选择合并中的一部分列.

You can work out the columns that are only in one DataFrame and use this to select a subset of columns in the merge.

cols_to_use = df2.columns.difference(df.columns)

然后执行合并(请注意,这是一个索引对象,但是它具有方便的tolist()方法).

Then perform the merge (note this is an index object but it has a handy tolist() method).

dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')

这将避免合并中的任何列冲突.

This will avoid any columns clashing in the merge.

这篇关于 pandas 合并-如何避免重复的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆