如何在不复制列的情况下合并Pandas数据框 [英] How to merge Pandas dataframes without duplicating columns

查看:95
本文介绍了如何在不复制列的情况下合并Pandas数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下形式的数据:

frame1 = pd.DataFrame({'supplier1_match0': ['x'], 'id': [1]})
frame2 = pd.DataFrame({'supplier1_match0': ['2x'], 'id': [2]})

并希望将多个框架加入这样的框架:

and wish to left join multiple frames to a frame like this:

base_frame = pd.DataFrame({'id':[1,2,3]})

我合并ID并得到:

merged = base_frame.merge(frame1, how='left', left_on='id', right_on='id')
merged = merged.merge(frame2, how='left', left_on='id', right_on='id')

   id supplier1_match0_x supplier1_match0_y
0   1                  x                NaN
1   2                NaN                 2x
2   3                NaN                NaN

该列已重复,并附加了"y".这是我需要的:

The column is duplicated and a 'y' is appended. Here is what I need:

id, supplier1_match0, ...
1,  x
2,  2x
3, NaN

有没有简单的方法可以做到这一点?还有一个类似的问题(嵌套字典到multiindex字典关键字为列标签的数据框),但数据的形状不同.请注意,我有多个供应商,并且它们具有不同数量的匹配项,因此我不能假定数据将具有矩形"形状.预先感谢.

Is there a simple way to achieve this? There is a similar question (Nested dictionary to multiindex dataframe where dictionary keys are column labels) but the data has a different shape. Note that I have multiple suppliers and that they have varying numbers of matches, so I can't assume the data will have a "rectangular" shape. Thanks in advance.

推荐答案

您的问题是您真的不想只merge一切.您需要concat第一组框架,然后合并.

Your problem is that you don't really want to just merge everything. You need to concat your first set of frames, then merge.

import pandas as pd
import numpy as np

base_frame.merge(pd.concat([frame1, frame2]), how='left')

#   id supplier1_match0
#0   1                x
#1   2               2x
#2   3              NaN


或者,您可以定义base_frame,使其具有其他框架的所有相关列,并将id设置为索引并使用.update.这样可以确保base_frame保持相同的大小,而上面的大小则没有.如果给定单元格有多个非空值,则数据将被覆盖.


Alternatively, you could define base_frame so that it has all of the relevant columns of the other frames and set id to be the index and use .update. This ensures base_frame remains the same size, while the above does not. Though data would be over-written if there are multiple non-null values for a given cell.

base_frame = pd.DataFrame({'id':[1,2,3]}).assign(supplier1_match0 = np.NaN).set_index('id')

for df in [frame1, frame2]:
    base_frame.update(df.set_index('id'))

print(base_frame)

   supplier1_match0
id                 
1                 x
2                2x
3               NaN

这篇关于如何在不复制列的情况下合并Pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆