如何根据小时标准每天获取每一组的最小值 [英] How to get minimum of each group for each day based on hour criteria

查看:75
本文介绍了如何根据小时标准每天获取每一组的最小值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在下面给出了两个数据框供您测试

I have given two dataframes below for you to test

df = pd.DataFrame({
    'subject_id':[1,1,1,1,1,1,1,1,1,1,1],
    'time_1' :['2173-04-03 12:35:00','2173-04-03 17:00:00','2173-04-03 
         20:00:00','2173-04-04 11:00:00','2173-04-04 11:30:00','2173-04-04 
       12:00:00','2173-04-05 16:00:00','2173-04-05 22:00:00','2173-04-06 
       04:00:00','2173-04-06 04:30:00','2173-04-06 06:30:00'],
  'val' :[5,5,5,10,5,10,5,8,3,8,10]
 })


df1 = pd.DataFrame({
 'subject_id':[1,1,1,1,1,1,1,1,1,1,1],
 'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-03 
           12:59:00','2173-04-03 13:14:00','2173-04-03 13:37:00','2173-04-04 
           11:30:00','2173-04-05 16:00:00','2173-04-05 22:00:00','2173-04-06 
           04:00:00','2173-04-06 04:30:00','2173-04-06 08:00:00'],
 'val' :[5,5,5,5,10,5,5,8,3,4,6]
 })

我想做的是

1)在each day for each subject_id中找到所有已经为same for more than 1 hour的值(从val列中)并获得minimum of it

1) Find all values (from val column) which have been same for more than 1 hour in each day for each subject_id and get the minimum of it

请注意,也可以在every 15 min duration处捕获值,因此您可能必须考虑5条记录才能查看> 1 hr条件).参见下面的示例屏幕截图

Please note that values can also be captured at every 15 min duration as well, so you might have to consider 5 records to see > 1 hr condition). See sample screenshot below

2)如果一天中没有same for more than 1 hour的值,则只需获取minimum of that day for that subject_id

2) If there are no values which were same for more than 1 hour in a day, then just get the minimum of that day for that subject_id

下面一个主题的屏幕截图将帮助您理解,下面给出了我尝试的代码

The below screenshot for one subject will help you understand and the code I tried is given below

这是我尝试过的

df['time_1'] = pd.to_datetime(df['time_1'])
df['time_2'] = df['time_1'].shift(-1)
df['tdiff'] = (df['time_2'] - df['time_1']).dt.total_seconds() / 3600
df['reading_day'] = pd.DatetimeIndex(df['time_1']).day

# don't know how to apply if else condition here to check for 1 hr criteria
t1 = df.groupby(['subject_id','reading_start_day','tdiff])['val'].min() 

由于我必须将其应用于数百万条记录,因此任何优雅而有效的解决方案都将有所帮助

As I have to apply this to million records, any elegant and efficient solution would be helpful

推荐答案

尝试一下.

from datetime import timedelta

def f(x):
    dif = (x.iloc[0]-x.iloc[-1])//timedelta(minutes=1)
    return dif
df1['time_1']= pd.to_datetime(df1['time_1'])
df1['flag']= df1.val.diff().ne(0).cumsum()
df1['t_d']=df1.groupby('flag')['time_1'].transform(f)
df1['date'] = df1['time_1'].dt.date
mask= df1['t_d'].ne(0)
dfa=df1[mask].groupby(['flag','date']).first().reset_index()
dfb=df1[~mask].groupby('date').first().reset_index().dropna(how='any')
df_f = dfa.merge(dfb, how='outer')
df_f.drop_duplicates(subset='date', keep='first', inplace=True)
df_f.drop(['flag','date','t_d'], axis=1, inplace=True)
df_f

输出.

 subject_id     time_1         val
0   1   2173-04-03 12:35:00     5
1   1   2173-04-04 11:30:00     5
2   1   2173-04-05 16:00:00     5
5   1   2173-04-06 04:00:00     3

这篇关于如何根据小时标准每天获取每一组的最小值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆