在python中合并数据帧时出现重复的行 [英] Duplicated rows when merging dataframes in python

查看:59
本文介绍了在python中合并数据帧时出现重复的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用外连接合并 2 个数据帧,但合并后,即使我进行合并的列包含相同的值,我也看到所有行都重复.详细说明:

I am currently merging 2 dataframes with an outer join, but after merging, I see all the rows are duplicated even when the columns I did the merge upon contain the same values. In detail:

list_1 = pd.read_csv('list_1.csv')
list_2 = pd.read_csv('list_2.csv')

merged_list = pd.merge(list_1 , list_2 , on=['email_address'], how='inner')

具有以下输入和结果:

列表_1:

email_address, name, surname
john.smith@email.com, john, smith
john.smith@email.com, john, smith
elvis@email.com, elvis, presley

列表_2:

email_address, street, city
john.smith@email.com, street1, NY
john.smith@email.com, street1, NY
elvis@email.com, street2, LA

合并列表:

email_address, name, surname, street, city
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
elvis@email.com, elvis, presley, street2, LA
elvis@email.com, elvis, presley, street2, LA

我的问题是,不应该是这样吗?

My question is, shouldn't it be like this?

merged_list(我希望它是怎样的:D):

merged_list (how I would like it to be :D):

email_address, name, surname, street, city
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
elvis@email.com, elvis, presley, street2, LA

我怎样才能让它变成这样?非常感谢您的帮助!

How can I make it so that it becomes like this? Thanks a lot for your help!

推荐答案

list_2_nodups = list_2.drop_duplicates()
pd.merge(list_1 , list_2_nodups , on=['email_address'])

预计会出现重复的行.list_1 中的每个 john smith 都与 list_2 中的每个 john smith 匹配.我不得不在其中一个列表中删除重复项.我选择了 list_2.

The duplicate rows are expected. Each john smith in list_1 matches with each john smith in list_2. I had to drop the duplicates in one of the lists. I chose list_2.

这篇关于在python中合并数据帧时出现重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆