如何根据时间条件进行累加-对 pandas 重新采样? [英] How to do cumsum based on a time condition - resample pandas?
问题描述
我有一个如下所示的数据框
I have a dataframe like as shown below
df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1],
'time_1' :['2173-04-03 10:00:00','2173-04-03 10:15:00','2173-04-03
10:30:00','2173-04-03 10:45:00','2173-04-03 11:05:00','2173-
04-03 11:15:00'],
'val' :[5,6,5,6,6,6]
})
我想找到一个顺序出现的值的总持续时间.下面的示例将帮助您理解
I would like to find the total duration of a value appearing in sequence. Below example will help you understand
从上面的屏幕截图中,您可以看到6
从10:45
到23:59
依次出现,而其他值(尽管可以是实时的任何值)根本没有顺序.
From the above screenshot, you can see that 6
occurs in sequence from 10:45
to 23:59
whereas other values (it could be any values in real time though) are not in sequence at all.
我做了这样的事情,但没有给出预期的输出结果.它会累加所有值
I did something like this but doesn't give expected output. It cumsums all values
df['time_1'] = pd.to_datetime(df['time_1'])
df['seq'] = df['val'] == df['val'].shift(-1)
s=pd.to_timedelta(24,unit='h')-(df.time_1-df.time_1.dt.normalize())
df['tdiff'] =df.groupby(df.time_1.dt.date).time_1.diff().shift(-1).fillna(s).dt.total_seconds()/3600
df.groupby([df['seq'] == True])['tdiff'].cumsum() # do cumulative sum only when the values are in sequence
如何根据条件对一组求和?
How can I do cumulative sum to a group based on a condition?
我希望我的输出如下所示.您会看到13:15
,因为从第一次出现6
到下一个13:15
小时(在10:45
处,13:15
给出了13:15
)
I expect my output to be like as shown below. You see 13:15
because we don't see any other value in our data for next 13:15
hour from first occurrence of 6
which is at 10:45
(24:00 hr - 10:45
gives 13:15
)
测试数据框
df = pd.DataFrame({
'subject_id':[1,1,1,1,1,1,1,1,1,1,1],
'time_1' :['2173-04-03 12:35:00','2173-04-03 12:50:00','2173-04-03
12:59:00','2173-04-03 13:14:00','2173-04-03 13:37:00','2173-04-04
11:30:00','2173-04-05 16:00:00','2173-04-05 22:00:00','2173-04-06
04:00:00','2173-04-06 04:30:00','2173-04-06 08:00:00'],
'val' :[5,5,5,5,10,5,5,8,3,4,6]
})
推荐答案
IIUC,尝试使用:
IIUC, Try with :
m=df.groupby(df.val.ne(df.val.shift()).cumsum()).first().rename_axis(None)
c=pd.to_timedelta(24,unit='h')-(m.time_1-m.time_1.dt.normalize())
final=m.assign(cumsum=m.time_1.diff().shift(-1).fillna(c))
subject_id time_1 val cumsum
1 1 2173-04-03 10:00:00 5 00:15:00
2 1 2173-04-03 10:15:00 6 00:15:00
3 1 2173-04-03 10:30:00 5 00:15:00
4 1 2173-04-03 10:45:00 6 13:15:00
详细信息:
df.val.ne(df.val.shift()).cumsum()
评估值是否每行都更改,并将相同的值分组为一个组.
df.val.ne(df.val.shift()).cumsum()
evaluates if values changes every row , and groups same values into a single group.
基于该组,我们对分组依据进行分组,并获得每个组的第一个条目.然后我们从time_1
中找到diff()
,并将其移至上方1个位置以与顶部索引对齐. Fillna与24小时之间存在差异.
Based on this group we groupby and get first entry of each group. Then we find diff()
from time_1
and shift 1 place above to align to the top index. fillna with difference from 24 hrs.
这篇关于如何根据时间条件进行累加-对 pandas 重新采样?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!