大 pandas 将df合并到很多，没有重复 [英] pandas merge df many to many without duplicates

查看：88 发布时间：2020/10/16 21:23:48 python pandas dataframe

本文介绍了大 pandas 将df合并到很多，没有重复的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有两个df，如下所示：

suppose i have two df like below:

import pandas as pd

data_dic = {
    "a": [0,0,1,2],
    "b": [3,3,4,5],
    "c": [6,7,8,9]
}
df1 = pd.DataFrame(data_dic)

data_dic = {
    "a": [0,0,1,2],
    "b": [3,3,4,5],
    "d": [10,10,12,13]
}
df2 = pd.DataFrame(data_dic)

结果：

df1

df2

   a  b   d
0  0  3  10
1  0  3  10
2  1  4  12
3  2  5  13

其中，每个df在列 a和 b上具有相同的键值。

where each of df have the same key value on column 'a' and 'b'.

问：如何将它们合并而不重复？ [使用df合并然后删除重复项的选项不令人满意]

Q: How to merge them without duplicates? [option with merging df and then delete duplicates is unsatisfactory]

我尝试通过使用以下代码来做到这一点，但这会创建重复的行并显着增加文件容量。

I try to do this by using below codes but this creates duplicate rows and significantly increases the file capacity.

df = pd.merge(df1, df2, on=['a', 'b'] , how='left', validate='many_to_many')
df = pd.merge(df1, df2, on=['a', 'b'] , how='inner')

预期结果：

   a  b  c   d
0  0  3  6  10
1  0  3  7  10
2  1  4  8  12
3  2  5  9  13

感谢大家的辛勤工作。

Thank You All for hard work.

推荐答案

使用 GroupBy.cumcount 分别用于两个 DataFrames 与 合并 通过添加列：

Use GroupBy.cumcount for counter columns in both DataFrames with merge by added column:

df1['g'] = df1.groupby(['a','b']).cumcount()
df2['g'] = df2.groupby(['a','b']).cumcount()

df = pd.merge(df1, df2, on=['a', 'b', 'g'] , how='inner')
print (df)
   a  b  c  g   d
0  0  3  6  0  10
1  0  3  7  1  10
2  1  4  8  0  12
3  2  5  9  0  13

与另一种解决方案的不同之处在于，第二个df秒 10 到 11 -纠正来自 df1 a，b 的合并>前一个 a，b 从第二秒开始，所有重复项和唯一对都相似：


Difference with another solutions the best see in changed data in second df second 10 to 11 - it correct merge by first duplicate pair a, b from df1 with first a, b pais from second, similar for all duplicates and also for unique pairs:
data_dic = {
    "a": [0,0,1,2],
    "b": [3,3,4,5],
    "d": [10,11,12,13]
}
df2 = pd.DataFrame(data_dic)


df1['g'] = df1.groupby(['a','b']).cumcount()
df2['g'] = df2.groupby(['a','b']).cumcount()

df = pd.merge(df1, df2, on=['a', 'b', 'g'] , how='inner')
print (df)

   a  b  c  g   d
0  0  3  6  0  10
1  0  3  7  1  11
2  1  4  8  0  12
3  2  5  9  0  13


                        这篇关于大 pandas 将df合并到很多，没有重复的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

大 pandas 将df合并到很多，没有重复 [英] pandas merge df many to many without duplicates

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

大 pandas 将df合并到很多，没有重复 [英] pandas merge df many to many without duplicates

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭