Python Pandas-根据先前获得的子集从数据框中删除行 [英] Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset

查看:281
本文介绍了Python Pandas-根据先前获得的子集从数据框中删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在安装Pandas 0.11.0库的情况下运行Python 2.7.

I'm running Python 2.7 with the Pandas 0.11.0 library installed.

我一直在寻找一个尚未解决此问题的答案,所以我希望有人比我有解决方案的经验丰富.

I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution.

假设我在df1中的数据如下所示:

Lets say my data, in df1, looks like the following:

df1=

  zip  x  y  access
  123  1  1    4
  123  1  1    6
  133  1  2    3
  145  2  2    3
  167  3  1    1
  167  3  1    2

例如,使用df2 = df1[df1['zip'] == 123],然后使用df2 = df2.join(df1[df1['zip'] == 133]),我得到以下数据子集:

Using, for instance, df2 = df1[df1['zip'] == 123] and then df2 = df2.join(df1[df1['zip'] == 133]) I get the following subset of data:

df2=

 zip  x  y  access
 123  1  1    4
 123  1  1    6
 133  1  2    3

我想做的是:

1)从df1中删除与df2

OR

2)创建df2后,从df2组成的df1中删除行(差异?)

2) After df2 has been created, remove the rows (difference?) from df1 which df2 is composed of

希望所有这些都是有道理的.请让我知道是否需要更多信息.

Hope all of that makes sense. Please let me know if any more info is needed.

理想情况下,将创建第三个数据框,如下所示:

Ideally a third dataframe would be create that looks like this:

df2=

 zip  x  y  access
 145  2  2    3
 167  3  1    1
 167  3  1    2

也就是说,df1中的所有内容都不在df2中.谢谢!

That is, everything from df1 not in df2. Thanks!

推荐答案

有两个选择.首先,使用isin和遮罩:

Two options come to mind. First, use isin and a mask:

>>> df
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> df_no
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2

第二,使用groupby:

>>> grouped = df.groupby(df['zip'].isin(keep))

,然后是

>>> grouped.get_group(True)
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> grouped.get_group(False)
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> [g for k,g in list(grouped)]
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]
>>> dict(list(grouped))
{False:    zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2, True:    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3}
>>> dict(list(grouped)).values()
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]

最有意义的取决于上下文,但我认为您明白了.

Which makes most sense depends upon the context, but I think you get the idea.

这篇关于Python Pandas-根据先前获得的子集从数据框中删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆