在 pandas 中的两个数据框之间选择唯一行 [英] Selecting Unique Rows between Two DataFrames in Pandas

查看:58
本文介绍了在 pandas 中的两个数据框之间选择唯一行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个尺寸不等的数据框A和B.我想创建一个数据框C,使其仅包含A和B之间唯一的行.我尝试遵循此解决方案(

I have two data frames A and B of unequal dimensions. I would like to create a data frame C such that it ONLY contains rows that are unique between A and B. I tried to follow this solution (excluding rows from a pandas dataframe based on column value and not index value) but could not get it to work.

这里是示例:

假设这是DF_A:

    Star_ID         Loc_ID      pmRA        pmDE    Field     Jmag    Hmag  
 2M00000032+5737103  4264    0.000000    0.000000    N7789   10.905  10.635
 2M00000068+5710233  4264    8.000000    -18.000000  N7789   10.664  10.132
 2M00000222+5625359  4264    0.000000    0.000000    N7789   11.982  11.433
 2M00000818+5634264  4264    0.000000    0.000000    N7789   12.501  11.892
 2M00001242+5524391  4264    0.000000    -4.000000   N7789   12.091  11.482

这就是DF_B:

2M00000032+5737103  
2M00000068+5710233
2M00001242+5524391

因此,前两个Star_ID和最后一个Star_ID在DF_A和DF_B之间是公用的.我想创建DF_C这样:

So, the first two and last Star_ID are common between DF_A and DF_B. I would like to create DF_C such that:

DF_C:

        Star_ID         Loc_ID      pmRA        pmDE    Field     Jmag    Hmag
     2M00000222+5625359  4264    0.000000    0.000000    N7789   11.982  11.433
     2M00000818+5634264  4264    0.000000    0.000000    N7789   12.501  11.892

推荐答案

这对我有用:

In [7]:

df1[~df1.Star_ID.isin(df2.Star_ID)]

Out[7]:

              Star_ID  Loc_ID  pmRA  pmDE  Field    Jmag    Hmag
2  2M00000222+5625359    4264     0     0  N7789  11.982  11.433
3  2M00000818+5634264    4264     0     0  N7789  12.501  11.892

[2 rows x 7 columns]

所以我们在这里做的是创建一个布尔掩码,我们要求两个数据帧中的Star_ID值在哪里,但是通过使用~我们NOT条件实际上使它无效.您链接到的链接几乎是同一件事,但我认为您可能不了解语法?

So what we do here is we create a boolean mask, we ask for where Star_ID values is in both dataframes, however by using the ~ we NOT the condition which in effect negates it. The one you linked to is pretty much the same thing but I think you maybe didn't understand the syntax?

编辑

为了同时获得仅在df1中的值和仅在df2中的值,您可以这样做

In order to get both values that are only in df1 and values that are only in df2 you could do this

unique_vals = df1[~df1.Star_ID.isin(df2.Star_ID)].append(df2[~df2.Star_ID.isin(df1.Star_ID)], ignore_index=True)

进一步编辑

所以问题是csv包含前导空格,这导致所有值在两个数据集中都是唯一的,要更正此错误,您需要执行以下操作:

So the problem was that the csv had leading spaces, this caused all values to be unique in both datasets, to correct this you need to do this:

df1.Apogee_ID = df1.Apogee_ID.str.lstrip()

这篇关于在 pandas 中的两个数据框之间选择唯一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆