从pandas groupby对象返回每个组的子集 [英] Returning subset of each group from a pandas groupby object
问题描述
我有如下所示的多级数据框:
I have the multilevel dataframe that looks like:
date_time name note value
list index
1 0 2015-05-22 05:37:59 Tom 129 False
1 2015-05-22 05:38:59 Tom 0 True
2 2015-05-22 05:39:59 Tom 0 False
3 2015-05-22 05:40:59 Tom 45 True
2 4 2015-05-22 05:37:59 Kate 129 True
5 2015-05-22 05:41:59 Kate 0 False
5 2015-05-22 05:37:59 Kate 0 True
我要遍历list
,并为list
的每一行检查value
列的值,如果它是False
,则删除该行.因此,最终目标是删除list
中所有在value
中具有False
的所有第一行.
我使用此代码,这似乎是逻辑:
I want iterate over the list
, and for each first row of list
check the value of column value
, and if it is False
, delete this row. So the final goal is to delete all the first rows in list
, that have False
in value
I use this code, that seems logic:
def delete_first_false():
for list, new_df in df.groupby(level=0):
for index, row in new_df.iterrows():
new_df=new_df.groupby('name').first().loc([new_df['value']!='False'])
return new_df
return df
但是我有这个错误
AttributeError: '_LocIndexer' object has no attribute 'groupby'
您能解释一下我的方法有什么问题吗?
could you explain me what's wrong with my method?
推荐答案
您的通用方法-使用循环-很少能按照您想要的方式工作.
Your general approach -- using loops -- rarely works the way you want in pandas.
如果有groupby
对象,则应使用apply
,agg
,filter
或transform
方法.在您的情况下,apply
是合适的.
If you have a groupby
object, you should use the apply
, agg
, filter
or transform
methods. In your case apply
is appropriate.
您的主要目标如下:
因此,最终目标是删除(由 )在
value
(列)中具有False
的list
.
So the final goal is to delete all the first rows in (each group defined by )
list
that haveFalse
in (the)value
(column).
因此,让我们编写一个简单的函数,以在单个独立的数据帧上做到这一点:
So let's write a simple function to do just that on a single, stand-alone dataframe:
def filter_firstrow_falses(df):
if not df['value'].iloc[0]:
return df.iloc[1:]
else:
return df
好.很简单.
现在,让我们将apply
应用于真实数据帧的每个组:
Now, let's apply
that to each group of your real dataframe:
import pandas
from io import StringIO
csv = StringIO("""\
list,date_time,name,note,value
1,2015-05-22 05:37:59,Tom,129,False
1,2015-05-22 05:38:59,Tom,0,True
1,2015-05-22 05:39:59,Tom,0,False
1,2015-05-22 05:40:59,Tom,45,True
2,2015-05-22 05:37:59,Kate,129,True
2,2015-05-22 05:41:59,Kate,0,False
2,2015-05-22 05:37:59,Kate,0,True
""")
df = pandas.read_csv(csv)
final = (
df.groupby(by=['list']) # create the groupby object
.apply(filter_firstrow_falses) # apply our function to each group
.reset_index(drop=True) # clean up the index
)
print(final)
list date_time name note value
0 1 2015-05-22 05:38:59 Tom 0 True
1 1 2015-05-22 05:39:59 Tom 0 False
2 1 2015-05-22 05:40:59 Tom 45 True
3 2 2015-05-22 05:37:59 Kate 129 True
4 2 2015-05-22 05:41:59 Kate 0 False
5 2 2015-05-22 05:37:59 Kate 0 True
这篇关于从pandas groupby对象返回每个组的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!