合并两个数据帧而不重复 pandas [英] merge two dataframes without repeats pandas

查看:31
本文介绍了合并两个数据帧而不重复 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试合并两个数据框,一个包含列:customerId、全名和电子邮件,另一个包含列:customerId、金额和日期的数据框.我想让第一个数据帧成为主数据帧,并包含其他数据帧信息,但前提是 customerIds 匹配;我试着做:

I am trying to merge two dataframes, one with columns: customerId, full name, and emails and the other dataframe with columns: customerId, amount, and date. I want to have the first dataframe be the main dataframe and the other dataframe information be included but only if the customerIds match up; I tried doing:

 merge = pd.merge(df, df2, on='customerId', how='left')

但是生成的数据帧包含很多重复并且看起来不对:

but the dataframe that is produced contains a lot of repeats and looks wrong:

customerId  full name   emails  amount  date
0   002963338   Star shine  star.shine@cdw.com  $2,910.94   2016-06-14
1   002963338   Star shine  star.shine@cdw.com  $9,067.70   2016-05-27
2   002963338   Star shine  star.shine@cdw.com  $6,507.24   2016-04-12
3   002963338   Star shine  star.shine@cdw.com  $1,457.99   2016-02-24
4   986423367   palm tree   tree.palm@snapchat.com,tree@.com    $4,604.83   2016-07-16

这不对,请帮忙!

推荐答案

customerId 列中存在重复项的问题.

There is problem you have duplicates in customerId column.

所以解决方案是删除它们,例如来自 drop_duplicates:

So solution is remove them, e.g. by drop_duplicates:

df2 = df2.drop_duplicates('customerId')

示例:

df = pd.DataFrame({'customerId':[1,2,1,1,2], 'full name':list('abcde')})
print (df)
   customerId full name
0           1         a
1           2         b
2           1         c
3           1         d
4           2         e

df2 = pd.DataFrame({'customerId':[1,2,1,2,1,1], 'full name':list('ABCDEF')})
print (df2)
   customerId full name
0           1         A
1           2         B
2           1         C
3           2         D
4           1         E
5           1         F

<小时>

merge = pd.merge(df, df2, on='customerId', how='left')
print (merge)
    customerId full name_x full name_y
0            1           a           A
1            1           a           C
2            1           a           E
3            1           a           F
4            2           b           B
5            2           b           D
6            1           c           A
7            1           c           C
8            1           c           E
9            1           c           F
10           1           d           A
11           1           d           C
12           1           d           E
13           1           d           F
14           2           e           B
15           2           e           D

df2 = df2.drop_duplicates('customerId')
merge = pd.merge(df, df2, on='customerId', how='left')
print (merge)
   customerId full name_x full name_y
0           1           a           A
1           2           b           B
2           1           c           A
3           1           d           A
4           2           e           B

这篇关于合并两个数据帧而不重复 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆