pandas 高级分组依据和按日期过滤 [英] Pandas advanced groupby and filter by date
本文介绍了 pandas 高级分组依据和按日期过滤的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
通过输入创建输出数据帧,如何在每个ID首次目标== 1时如何过滤行,或者按顺序删除目标为1的每个ID的连续出现,但是在目标之前将全0保持在目标之前= 1.
Create the output dataframe from input, how to filter for rows when target == 1 for the first time for each id, or in order words removing consecutive occurrence for each ids where target is 1 however keep all 0s in target before target = 1.
输入
ID date target
a1 2019-11-01 0
a1 2019-12-01 0
a1 2020-01-01 1
a1 2020-02-01 1
a1 2020-03-01 0
a2 2019-11-01 0
a2 2019-12-01 1
a2 2020-03-01 0
a2 2020-04-01 1
输出
ID date target
a1 2019-11-01 0
a1 2019-12-01 0
a1 2020-01-01 1
a2 2019-11-01 0
a2 2019-12-01 1
推荐答案
from io import stringIO
data = StringIO("""
uid, date, target
a1, 2019-11-01, 0
a1, 2019-12-01, 0
a1, 2020-01-01, 1
a1, 2020-02-01, 1
a1, 2020-03-01, 0
a2, 2019-11-01, 0
a2, 2019-12-01, 1
a2, 2020-03-01, 0
a2, 2020-04-01, 1
"""
)
df = pd.read_csv(data).rename(columns=lambda x: x.strip())
def filter_in_group(df: pd.DataFrame):
ind = np.argmax(df.target)
return df.loc[:, ['date', 'target']].iloc[:ind+1]
df_filtered = (
df
.groupby('uid')
.apply(lambda x: filter_in_group(x))
.reset_index()
.drop('level_1', axis=1)
)
这篇关于 pandas 高级分组依据和按日期过滤的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文