从pandas groupby对象返回每个组的子集 [英] Returning subset of each group from a pandas groupby object

查看:578
本文介绍了从pandas groupby对象返回每个组的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有如下所示的多级数据框:

I have the multilevel dataframe that looks like:

                      date_time      name  note   value
list index                                    
1    0     2015-05-22 05:37:59       Tom   129    False
     1     2015-05-22 05:38:59       Tom     0    True
     2     2015-05-22 05:39:59       Tom     0    False
     3     2015-05-22 05:40:59       Tom    45    True
2    4     2015-05-22 05:37:59       Kate   129    True
     5     2015-05-22 05:41:59       Kate     0    False
     5     2015-05-22 05:37:59       Kate     0    True

我要遍历list,并为list的每一行检查value列的值,如果它是False,则删除该行.因此,最终目标是删除list中所有在value中具有False的所有第一行. 我使用此代码,这似乎是逻辑:

I want iterate over the list , and for each first row of list check the value of column value, and if it is False, delete this row. So the final goal is to delete all the first rows in list, that have False in value I use this code, that seems logic:

def delete_first_false():
    for list, new_df in df.groupby(level=0):
        for index, row in new_df.iterrows():
            new_df=new_df.groupby('name').first().loc([new_df['value']!='False'])
        return new_df
    return df

但是我有这个错误

AttributeError: '_LocIndexer' object has no attribute 'groupby'

您能解释一下我的方法有什么问题吗?

could you explain me what's wrong with my method?

推荐答案

您的通用方法-使用循环-很少能按照您想要的方式工作.

Your general approach -- using loops -- rarely works the way you want in pandas.

如果有groupby对象,则应使用applyaggfiltertransform方法.在您的情况下,apply是合适的.

If you have a groupby object, you should use the apply, agg, filter or transform methods. In your case apply is appropriate.

您的主要目标如下:

因此,最终目标是删除(由 )在value(列)中具有Falselist.

So the final goal is to delete all the first rows in (each group defined by ) list that have False in (the) value (column).

因此,让我们编写一个简单的函数,以在单个独立的数据帧上做到这一点:

So let's write a simple function to do just that on a single, stand-alone dataframe:

def filter_firstrow_falses(df):
    if not df['value'].iloc[0]:
        return df.iloc[1:]
    else:
        return df

好.很简单.

现在,让我们将apply应用于真实数据帧的每个组:

Now, let's apply that to each group of your real dataframe:

import pandas
from io import StringIO

csv = StringIO("""\
list,date_time,name,note,value
1,2015-05-22 05:37:59,Tom,129,False
1,2015-05-22 05:38:59,Tom,0,True
1,2015-05-22 05:39:59,Tom,0,False
1,2015-05-22 05:40:59,Tom,45,True
2,2015-05-22 05:37:59,Kate,129,True
2,2015-05-22 05:41:59,Kate,0,False
2,2015-05-22 05:37:59,Kate,0,True
""")

df = pandas.read_csv(csv)

final = (
    df.groupby(by=['list']) # create the groupby object
      .apply(filter_firstrow_falses) # apply our function to each group
      .reset_index(drop=True) # clean up the index
)
print(final)


   list            date_time  name  note  value
0     1  2015-05-22 05:38:59   Tom     0   True
1     1  2015-05-22 05:39:59   Tom     0  False
2     1  2015-05-22 05:40:59   Tom    45   True
3     2  2015-05-22 05:37:59  Kate   129   True
4     2  2015-05-22 05:41:59  Kate     0  False
5     2  2015-05-22 05:37:59  Kate     0   True

这篇关于从pandas groupby对象返回每个组的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆