大 pandas 将df合并到很多,没有重复 [英] pandas merge df many to many without duplicates
问题描述
假设我有两个df,如下所示:
suppose i have two df like below:
import pandas as pd
data_dic = {
"a": [0,0,1,2],
"b": [3,3,4,5],
"c": [6,7,8,9]
}
df1 = pd.DataFrame(data_dic)
data_dic = {
"a": [0,0,1,2],
"b": [3,3,4,5],
"d": [10,10,12,13]
}
df2 = pd.DataFrame(data_dic)
结果:
df1
a b c
0 0 3 6
1 0 3 7
2 1 4 8
3 2 5 9
df2
a b d
0 0 3 10
1 0 3 10
2 1 4 12
3 2 5 13
其中,每个df在列 a和 b上具有相同的键值。
where each of df have the same key value on column 'a' and 'b'.
问:如何将它们合并而不重复? [使用df合并然后删除重复项的选项不令人满意]
Q: How to merge them without duplicates? [option with merging df and then delete duplicates is unsatisfactory]
我尝试通过使用以下代码来做到这一点,但这会创建重复的行并显着增加文件容量。
I try to do this by using below codes but this creates duplicate rows and significantly increases the file capacity.
df = pd.merge(df1, df2, on=['a', 'b'] , how='left', validate='many_to_many')
df = pd.merge(df1, df2, on=['a', 'b'] , how='inner')
预期结果:
a b c d
0 0 3 6 10
1 0 3 7 10
2 1 4 8 12
3 2 5 9 13
感谢大家的辛勤工作。
Thank You All for hard work.
推荐答案
使用 GroupBy.cumcount
分别用于两个 DataFrames
与 合并
通过添加列:
Use GroupBy.cumcount
for counter columns in both DataFrames
with merge
by added column:
df1['g'] = df1.groupby(['a','b']).cumcount()
df2['g'] = df2.groupby(['a','b']).cumcount()
df = pd.merge(df1, df2, on=['a', 'b', 'g'] , how='inner')
print (df)
a b c g d
0 0 3 6 0 10
1 0 3 7 1 10
2 1 4 8 0 12
3 2 5 9 0 13
与另一种解决方案的不同之处在于,第二个df秒 10
到 11
-纠正来自 df1
a,b 的合并>前一个 a,b 从第二秒开始,所有重复项和唯一对都相似:
Difference with another solutions the best see in changed data in second df second 10
to 11
- it correct merge by first duplicate pair a, b
from df1
with first a, b
pais from second, similar for all duplicates and also for unique pairs:
data_dic = {
"a": [0,0,1,2],
"b": [3,3,4,5],
"d": [10,11,12,13]
}
df2 = pd.DataFrame(data_dic)
df1['g'] = df1.groupby(['a','b']).cumcount()
df2['g'] = df2.groupby(['a','b']).cumcount()
df = pd.merge(df1, df2, on=['a', 'b', 'g'] , how='inner')
print (df)
a b c g d
0 0 3 6 0 10
1 0 3 7 1 11
2 1 4 8 0 12
3 2 5 9 0 13
这篇关于大 pandas 将df合并到很多,没有重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!