在python中合并数据帧时出现重复的行 [英] Duplicated rows when merging dataframes in python
问题描述
我目前正在使用外连接合并 2 个数据帧,但合并后,即使我进行合并的列包含相同的值,我也看到所有行都重复.详细说明:
I am currently merging 2 dataframes with an outer join, but after merging, I see all the rows are duplicated even when the columns I did the merge upon contain the same values. In detail:
list_1 = pd.read_csv('list_1.csv')
list_2 = pd.read_csv('list_2.csv')
merged_list = pd.merge(list_1 , list_2 , on=['email_address'], how='inner')
具有以下输入和结果:
列表_1:
email_address, name, surname
john.smith@email.com, john, smith
john.smith@email.com, john, smith
elvis@email.com, elvis, presley
列表_2:
email_address, street, city
john.smith@email.com, street1, NY
john.smith@email.com, street1, NY
elvis@email.com, street2, LA
合并列表:
email_address, name, surname, street, city
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
elvis@email.com, elvis, presley, street2, LA
elvis@email.com, elvis, presley, street2, LA
我的问题是,不应该是这样吗?
My question is, shouldn't it be like this?
merged_list(我希望它是怎样的:D):
merged_list (how I would like it to be :D):
email_address, name, surname, street, city
john.smith@email.com, john, smith, street1, NY
john.smith@email.com, john, smith, street1, NY
elvis@email.com, elvis, presley, street2, LA
我怎样才能让它变成这样?非常感谢您的帮助!
How can I make it so that it becomes like this? Thanks a lot for your help!
推荐答案
list_2_nodups = list_2.drop_duplicates()
pd.merge(list_1 , list_2_nodups , on=['email_address'])
预计会出现重复的行.list_1
中的每个 john smith 都与 list_2
中的每个 john smith 匹配.我不得不在其中一个列表中删除重复项.我选择了 list_2
.
The duplicate rows are expected. Each john smith in list_1
matches with each john smith in list_2
. I had to drop the duplicates in one of the lists. I chose list_2
.
这篇关于在python中合并数据帧时出现重复的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!