删除特定值之前的前几行-Pandas [英] Removing the first rows before a specific value - pandas

查看:608
本文介绍了删除特定值之前的前几行-Pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试删除组初始值之前的所有行.例如,如果我的max_value = 250,则应删除该值之前的组的所有行.如果该组的结果值再次小于或等于250,则不会将其删除.

I am trying to remove all rows before an initial value for a group. For instance, if my max_value = 250, then all rows for a group before that value should be removed. If a consequtive value of 250 or less appears again for that group, it is not removed.

import pandas as pd
df = pd.DataFrame({
    'date': ['2019-01-01','2019-02-01','2019-03-01', '2019-04-01',
             '2019-01-01','2019-02-01','2019-03-01', '2019-04-01',
             '2019-01-01','2019-02-01','2019-03-01', '2019-04-01'],
    'Asset': ['Asset A', 'Asset A', 'Asset A', 'Asset A', 'Asset A', 'Asset A', 'Asset B', 'Asset B',
             'Asset B', 'Asset B', 'Asset B', 'Asset B'],
    'Monthly Value': [100, 200, 300, 400, 500, 600, 100, 200, 300, 200, 300, 200]
})

unique_list = list(df['Asset'].unique())
max_value = 250
print(df)

          date    Asset  Monthly Value
0   2019-01-01  Asset A            100
1   2019-02-01  Asset A            200
2   2019-03-01  Asset A            300
3   2019-04-01  Asset A            400
4   2019-01-01  Asset A            500
5   2019-02-01  Asset A            600
6   2019-03-01  Asset B            100
7   2019-04-01  Asset B            200
8   2019-01-01  Asset B            300
9   2019-02-01  Asset B            200
10  2019-03-01  Asset B            300
11  2019-04-01  Asset B            200

如果阈值或max_value为250,则数据帧应如下所示(如下).请注意,第一次为组检测到小于250的值时,将删除所有这些行.如果再次显示250或更高的值,则将其保留.任何帮助将不胜感激.

if the threshold or max_value is 250, then the dataframe should look like this (below). Notice the first time a value under 250 is detected for a group, all of those rows are removed. If the value 250 or higher is shown again, it is kept. Any help would be appreciated.

          date    Asset  Monthly Value
2   2019-03-01  Asset A            300
3   2019-04-01  Asset A            400
4   2019-01-01  Asset A            500
5   2019-02-01  Asset A            600
8   2019-01-01  Asset B            300
9   2019-02-01  Asset B            200
10  2019-03-01  Asset B            300
11  2019-04-01  Asset B            200

推荐答案

这应该可以解决问题:

df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value).cumsum().ne(0))]

收益:

          date    Asset  Monthly Value
2   2019-03-01  Asset A            300
3   2019-04-01  Asset A            400
4   2019-01-01  Asset A            500
5   2019-02-01  Asset A            600
8   2019-01-01  Asset B            300
9   2019-02-01  Asset B            200
10  2019-03-01  Asset B            300
11  2019-04-01  Asset B            200

此外,如果将最大值存储在像max_value = {'Asset A': 250, 'Asset B': 250}这样的字典中,则可以执行以下操作来获得相同的结果:

Additionally, if you store your max values in a dictionary like max_value = {'Asset A': 250, 'Asset B': 250}, you can do the following to achieve the same result:

df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value[x.name]).cumsum().ne(0))]

这篇关于删除特定值之前的前几行-Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆