如何根据时间条件进行累加-对 pandas 重新采样? [英] How to do cumsum based on a time condition - resample pandas?

查看:101
本文介绍了如何根据时间条件进行累加-对 pandas 重新采样?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框

I have a dataframe like as shown below

df = pd.DataFrame({
   'subject_id':[1,1,1,1,1,1],
   'time_1' :['2173-04-03 10:00:00','2173-04-03 10:15:00','2173-04-03 
              10:30:00','2173-04-03 10:45:00','2173-04-03 11:05:00','2173- 
              04-03 11:15:00'],
   'val' :[5,6,5,6,6,6]
})

我想找到一个顺序出现的值的总持续时间.下面的示例将帮助您理解

I would like to find the total duration of a value appearing in sequence. Below example will help you understand

从上面的屏幕截图中,您可以看到610:4523:59依次出现,而其他值(尽管可以是实时的任何值)根本没有顺序.

From the above screenshot, you can see that 6 occurs in sequence from 10:45 to 23:59 whereas other values (it could be any values in real time though) are not in sequence at all.

我做了这样的事情,但没有给出预期的输出结果.它会累加所有值

I did something like this but doesn't give expected output. It cumsums all values

df['time_1'] = pd.to_datetime(df['time_1'])
df['seq'] = df['val'] == df['val'].shift(-1)

s=pd.to_timedelta(24,unit='h')-(df.time_1-df.time_1.dt.normalize())
df['tdiff'] =df.groupby(df.time_1.dt.date).time_1.diff().shift(-1).fillna(s).dt.total_seconds()/3600
df.groupby([df['seq'] == True])['tdiff'].cumsum() # do cumulative sum only when the values are in sequence

如何根据条件对一组求和?

How can I do cumulative sum to a group based on a condition?

我希望我的输出如下所示.您会看到13:15,因为从第一次出现6到下一个13:15小时(在10:45处,13:15给出了13:15)

I expect my output to be like as shown below. You see 13:15 because we don't see any other value in our data for next 13:15 hour from first occurrence of 6 which is at 10:45 (24:00 hr - 10:45 gives 13:15)

测试数据框

df = pd.DataFrame({
     'subject_id':[1,1,1,1,1,1,1,1,1,1,1],
     'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-03 
     12:59:00','2173-04-03 13:14:00','2173-04-03 13:37:00','2173-04-04 
     11:30:00','2173-04-05 16:00:00','2173-04-05 22:00:00','2173-04-06 
     04:00:00','2173-04-06 04:30:00','2173-04-06 08:00:00'],
     'val' :[5,5,5,5,10,5,5,8,3,4,6]
   })

推荐答案

IIUC,尝试使用:

IIUC, Try with :

m=df.groupby(df.val.ne(df.val.shift()).cumsum()).first().rename_axis(None)
c=pd.to_timedelta(24,unit='h')-(m.time_1-m.time_1.dt.normalize())
final=m.assign(cumsum=m.time_1.diff().shift(-1).fillna(c))


   subject_id              time_1  val   cumsum
1           1 2173-04-03 10:00:00    5 00:15:00
2           1 2173-04-03 10:15:00    6 00:15:00
3           1 2173-04-03 10:30:00    5 00:15:00
4           1 2173-04-03 10:45:00    6 13:15:00

详细信息:

df.val.ne(df.val.shift()).cumsum()评估值是否每行都更改,并将相同的值分组为一个组.

df.val.ne(df.val.shift()).cumsum() evaluates if values changes every row , and groups same values into a single group.

基于该组,我们对分组依据进行分组,并获得每个组的第一个条目.然后我们从time_1中找到diff(),并将其移至上方1个位置以与顶部索引对齐. Fillna与24小时之间存在差异.

Based on this group we groupby and get first entry of each group. Then we find diff() from time_1 and shift 1 place above to align to the top index. fillna with difference from 24 hrs.

这篇关于如何根据时间条件进行累加-对 pandas 重新采样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆