Python Pandas-根据先前获得的子集从数据框中删除行 [英] Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset
问题描述
我正在安装Pandas 0.11.0
库的情况下运行Python 2.7
.
I'm running Python 2.7
with the Pandas 0.11.0
library installed.
我一直在寻找一个尚未解决此问题的答案,所以我希望有人比我有解决方案的经验丰富.
I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution.
假设我在df1中的数据如下所示:
Lets say my data, in df1, looks like the following:
df1=
zip x y access
123 1 1 4
123 1 1 6
133 1 2 3
145 2 2 3
167 3 1 1
167 3 1 2
例如,使用df2 = df1[df1['zip'] == 123]
,然后使用df2 = df2.join(df1[df1['zip'] == 133])
,我得到以下数据子集:
Using, for instance, df2 = df1[df1['zip'] == 123]
and then df2 = df2.join(df1[df1['zip'] == 133])
I get the following subset of data:
df2=
zip x y access
123 1 1 4
123 1 1 6
133 1 2 3
我想做的是:
1)从df1
中删除与df2
OR
2)创建df2
后,从df2
组成的df1
中删除行(差异?)
2) After df2
has been created, remove the rows (difference?) from df1
which df2
is composed of
希望所有这些都是有道理的.请让我知道是否需要更多信息.
Hope all of that makes sense. Please let me know if any more info is needed.
理想情况下,将创建第三个数据框,如下所示:
Ideally a third dataframe would be create that looks like this:
df2=
zip x y access
145 2 2 3
167 3 1 1
167 3 1 2
也就是说,df1
中的所有内容都不在df2
中.谢谢!
That is, everything from df1
not in df2
. Thanks!
推荐答案
有两个选择.首先,使用isin
和遮罩:
Two options come to mind. First, use isin
and a mask:
>>> df
zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3
>>> df_no
zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2
第二,使用groupby
:
>>> grouped = df.groupby(df['zip'].isin(keep))
,然后是
>>> grouped.get_group(True)
zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3
>>> grouped.get_group(False)
zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2
>>> [g for k,g in list(grouped)]
[ zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2, zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3]
>>> dict(list(grouped))
{False: zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2, True: zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3}
>>> dict(list(grouped)).values()
[ zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2, zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3]
最有意义的取决于上下文,但我认为您明白了.
Which makes most sense depends upon the context, but I think you get the idea.
这篇关于Python Pandas-根据先前获得的子集从数据框中删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!