如何计算多个数据帧之间的重叠行? [英] How to count overlap rows among multiple dataframes?

查看:89
本文介绍了如何计算多个数据帧之间的重叠行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的多个数据框.

I have a multiple dataframe like below.

df1 = pd.DataFrame({'Col1':["aaa","ddd","ggg"],'Col2':["bbb","eee","hhh"],'Col3':"ccc","fff","iii"]})
df2= pd.DataFrame({'Col1':["aaa","zzz","qqq"],'Col2':["bbb","xxx","eee"],'Col3':["ccc", yyy","www"]})
df3= pd.DataFrame({'Col1':"rrr","zzz","qqq","ppp"],'Col2':"ttt","xxx","eee","ttt"],'Col3':"yyy","yyy","www","qqq"]})

数据框具有3列,有时它们的行在数据框之间重叠. (例如df1和df2与"aaa,bbb,ccc"具有相同的行).

The dataframe has 3 columns and sometimes their rows overlap among the dataframes. (e.g. df1 and df2 has an identical row as "aaa, bbb, ccc").

我想知道数据帧之间的行是如何重叠的,并希望产生如下所示的输出.

I want to know how the rows overlap among dataframes and want to make an output like below.

在此输出中,如果在数据框中检测到相同的行,则输出将为1,否则为0.有人知道如何进行此输出吗?

In this output, if an identical row is detected in the dataframe, the output will be 1, otherwise 0. Does anyone know how to make this output?

在实际数据中,我有约100个数据帧.我首先尝试使用pd.merge,但无法将其应用于100个数据帧...

In the actual data, I have ~100 dataframes. I first tried to use pd.merge but could not apply this to 100 dataframes...

非常感谢您的帮助.

推荐答案

这是使用concatget_dummies的一种方法:

Here is one way using concat and get_dummies:

l = [df1,df2,df3] #create a list of dataframes
final = pd.concat([i.assign(key=f"df{e+1}") for e,i in enumerate(l)],sort=False)

final = (final.assign(**pd.get_dummies(final.pop('key')))
        .groupby(['Col1','Col2','Col3']).max().reset_index())


  Col1 Col2 Col3  df1  df2  df3
0  aaa  bbb  ccc    1    1    0
1  ddd  eee  fff    1    0    0
2  ggg  hhh  iii    1    0    0
3  ppp  ttt  qqq    0    0    1
4  qqq  eee  www    0    1    1
5  rrr  ttt  yyy    0    0    1
6  zzz  xxx  yyy    0    1    1

这篇关于如何计算多个数据帧之间的重叠行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆