如何在python中删除组中的某些行 [英] How to remove some rows in a group by in python

查看:353
本文介绍了如何在python中删除组中的某些行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我想基于列进行groupby(),然后根据日期列对每个组中的值进行排序.然后,我要从每个记录中删除其column_condition == 'B'值的记录,直到到达其column_condition == 'A'的行.例如,假设下表是组之一

I'm having a dataframe and I'd like to do a groupby() based a column and then sort the values within each group based on a date column. Then, from a each I'd like to remove records whose value for column_condition == 'B' until I reach to a row whose column_condition == 'A'. For example, Assume the table below is one of the groups

ID, DATE, column_condition
--------------------------
1, jan 2017, B
1, Feb 2017, B
1, Mar 2017, B
1, Aug 2017, A
1, Sept 2017, B

因此,我想删除前三行,而只剩下最后两行.我该怎么办?

So, I'd like to remove the first three rows and leave this group with only the last two rows. How can I do that?

推荐答案

我想我终于理解了您的问题:您希望按'ID' groupby a dataframe,按日期排序,并保留行之后的行. condition列中'A'的第一次出现.我提出了以下一种班轮解决方案:

I think I finally understand your question: you wish to groupby a dataframe by 'ID', sort by date, and keep the rows after the first ocurrence of 'A' in your condition column. I've come up with the following one liner solution:

设置虚拟数据

import pandas as pd
import datetime as dt

d = {
    'ID': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2], # Assuming only two unique IDs for simplicity
    'DATE': [ # Dates already sorted, but it would work anyways
        dt.date(2018, 7, 19), dt.date(2018, 8, 18),
        dt.date(2018, 9, 17), dt.date(2018, 10, 17),
        dt.date(2018, 11, 16), dt.date(2018, 7, 19),
        dt.date(2018, 8, 18), dt.date(2018, 9, 17),
        dt.date(2018, 10, 17), dt.date(2018, 11, 16)
    ],
    'condition': ['B', 'B', 'B', 'A', 'B', 'B', 'B', 'B', 'A', 'B']
}
# 'DATE' but with list comprehension: 
# [dt.date.today() + dt.timedelta(days=30*x) for y in range(0, 2) for x in range(0, 5)]
df = pd.DataFrame(d)

翻译

>>> (df.sort_values(by='DATE') # we should call pd.to_datetime() first if...
...     .groupby('ID') # 'DATE' is not datetime already
...     .apply(lambda x: x[(x['condition'].values == 'A').argmax():]))

      ID        DATE condition
ID
1  3   1  2018-10-17         A
   4   1  2018-11-16         B
2  8   2  2018-10-17         A
   9   2  2018-11-16         B

如果您需要这样的话,您也可以致电reset_index(drop=True):

You can also call reset_index(drop=True), if you need something like this:

   ID        DATE condition
0   1  2018-10-17         A
1   1  2018-11-16         B
2   2  2018-10-17         A
3   2  2018-11-16         B

(x['condition'].values == 'A')返回一个bool np.array,然后调用argmax()给我们提供索引,在该位置第一次出现True的情况发生(在本例中为condition == 'A').使用该索引,我们用slice子组化每个组.

(x['condition'].values == 'A') returns a bool np.array, and calling argmax() gives us then index where the first ocurrence of True happens (where condition == 'A' in this case). Using that index, we're subsetting each of the groups with a slice.

添加了用于处理仅包含不良条件的组的过滤器.

Added filter for dealing with groups that only contain the undesired condition.

d = {
    'ID': [1, 1, 1, 1, 1, 2, 2, 2, 2, 2], # Assuming only two unique IDs for simplicity
    'DATE': [ # Dates already sorted, but it would work anyways
        dt.date(2018, 7, 19), dt.date(2018, 8, 18),
        dt.date(2018, 9, 17), dt.date(2018, 10, 17),
        dt.date(2018, 11, 16), dt.date(2018, 7, 19),
        dt.date(2018, 8, 18), dt.date(2018, 9, 17),
        dt.date(2018, 10, 17), dt.date(2018, 11, 16)
    ], # ID 1 only contains 'B'
    'condition': ['B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'A', 'B']
}
df = pd.DataFrame(d)

翻译

>>> df
   ID        DATE condition
0   1  2018-07-19         B
1   1  2018-08-18         B
2   1  2018-09-17         B
3   1  2018-10-17         B
4   1  2018-11-16         B
5   2  2018-07-19         B
6   2  2018-08-18         B
7   2  2018-09-17         B
8   2  2018-10-17         A
9   2  2018-11-16         B

>>> (df.sort_values(by='DATE')
...    .groupby('ID')
...    .filter(lambda x: (x['condition'] == 'A').any())
...    .groupby('ID')
...    .apply(lambda x: x[(x['condition'].values == 'A').argmax():]))

     ID        DATE condition
ID
2  8   2  2018-10-17         A
   9   2  2018-11-16         B

这篇关于如何在python中删除组中的某些行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆