pandas 合并会创建多余的重复条目 [英] Pandas merge creates unwanted duplicate entries

查看:75
本文介绍了 pandas 合并会创建多余的重复条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Pandas的新手,我想合并两个具有相似列的数据集.除了许多相同的值之外,各列将与另一列相比具有一些唯一的值.我想保留每列中的一些重复项.我想要的输出如下所示.添加how ='inner'或'outer'不会产生预期的结果.

I'm new to Pandas and I want to merge two datasets that have similar columns. The columns are going to each have some unique values compared to the other column, in addition to many identical values. There are some duplicates in each column that I'd like to keep. My desired output is shown below. Adding how='inner' or 'outer' does not yield the desired result.

import pandas as pd

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

print(pd.merge(df1,df2))

output:
   A
0  2
1  2
2  2
3  2
4  3
5  4
6  5

desired/expected output:
   A
0  2
1  2
2  3
3  4
4  5

请让我知道如何/如果可以通过合并实现所需的输出,谢谢!

Please let me know how/if I can achieve the desired output using merge, thank you!

编辑 为了弄清楚为什么我对此行为感到困惑,如果仅添加另一列,它不会产生四个2,而是只有两个2,因此我希望在我的第一个示例中它也会具有两个2.为什么行为似乎会改变,大熊猫在做什么?

EDIT To clarify why I'm confused about this behavior, if I simply add another column, it doesn't make four 2's but rather there are only two 2's, so I would expect that in my first example it would also have the two 2's. Why does the behavior seem to change, what's pandas doing?

import pandas as pd
dict1 = {'A':[2,2,3,4,5],
         'B':['red','orange','yellow','green','blue'],
        }
dict2 = {'A':[2,2,3,4,5],
         'B':['red','orange','yellow','green','blue'],
        }

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

print(pd.merge(df1,df2))

output:
   A       B
0  2     red
1  2  orange
2  3  yellow
3  4   green
4  5    blue

However, based on the first example I would expect:
   A       B
0  2     red
1  2  orange
2  2     red
3  2  orange
4  3  yellow
5  4   green
6  5    blue

推荐答案

import pandas as pd

dict1 = {'A':[2,2,3,4,5]}
dict2 = {'A':[2,2,3,4,5]}

df1 = pd.DataFrame(dict1).reset_index()
df2 = pd.DataFrame(dict2).reset_index()

df = df1.merge(df2, on = 'A')
df = pd.DataFrame(df[df.index_x==df.index_y]['A'], columns=['A']).reset_index(drop=True)

print(df)

输出:

   A
0  2
1  2
2  3
3  4
4  5

这篇关于 pandas 合并会创建多余的重复条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆