pandas -基于列条目的两个数据框的交集 [英] Pandas - intersection of two data frames based on column entries

查看:106
本文介绍了 pandas -基于列条目的两个数据框的交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有两个这样的DataFrame:

Suppose I have two DataFrames like so:

>>dfA
             S                      T            prob
0        ! ! !                ! ! ! !   8.1623999e-05
1        ! ! !                ! ! ! "   0.00354090007
2        ! ! !                ! ! ! .   0.00210241997
3        ! ! !                ! ! ! ?  6.55684998e-05
4        ! ! !                  ! ! !     0.203119993
5        ! ! !                ! ! ! "  6.62070015e-05
6        ! ! !                    ! !   0.00481862016
7        ! ! !                      !    0.0274260994
8        ! ! !                " ! ! !  7.99940026e-05
9        ! ! !                    " !  1.51188997e-05
10       ! ! !                      "  8.50678989e-05

>>dfB
             S                      T                                 knstats
0        ! ! !                ! ! ! !                 knstats=2,391,104,64,25
1        ! ! !                ! ! ! "                    knstats=4,391,6,64,2
2        ! ! !                ! ! ! .                    knstats=4,391,5,64,2
3        ! ! !                ! ! ! ?                    knstats=1,391,4,64,4
4        ! ! !                  ! ! !               knstats=220,391,303,64,55
5        ! ! !                    ! !               knstats=16,391,957,64,115
6        ! ! !                      !              knstats=28,391,5659,64,932
7        ! ! !                " ! ! !                    knstats=2,391,2,64,1
8        ! ! !                    " !                  knstats=1,391,37,64,13
9        ! ! !                      "     knstats=2,391,1.11721e+06,64,180642
10       ! ! !                    . "           knstats=2,391,120527,64,20368

我想创建一个新的DataFrame,它由在两个矩阵中具有匹配的"S"和"T"条目的行以及dfA的prob列和dfB的knstats列组成.结果应类似于以下内容,并且顺序必须相同:

I want to create a new DataFrame which is composed of the rows which have matching "S" and "T" entries in both matrices, along with the prob column from dfA and the knstats column from dfB. The result should look something like the following, and it is important that the order is the same:

             S                      T            prob                             knstats
0        ! ! !                ! ! ! !   8.1623999e-05             knstats=2,391,104,64,25
1        ! ! !                ! ! ! "   0.00354090007                knstats=4,391,6,64,2
2        ! ! !                ! ! ! .   0.00210241997                knstats=4,391,5,64,2
3        ! ! !                ! ! ! ?  6.55684998e-05                knstats=1,391,4,64,4
4        ! ! !                  ! ! !     0.203119993           knstats=220,391,303,64,55
5        ! ! !                    ! !   0.00481862016           knstats=16,391,957,64,115
6        ! ! !                      !    0.0274260994          knstats=28,391,5659,64,932
7        ! ! !                " ! ! !  7.99940026e-05                knstats=2,391,2,64,1
8        ! ! !                    " !  1.51188997e-05              knstats=1,391,37,64,13
9        ! ! !                      "  8.50678989e-05 knstats=2,391,1.11721e+06,64,180642

推荐答案

您可以将它们合并:

s1 = pd.merge(dfA, dfB, how='inner', on=['S', 'T'])

要删除NA行:

s1.dropna(inplace=True)

这篇关于 pandas -基于列条目的两个数据框的交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆