计算时间序列中的连续值 [英] Count consecutives values in time series
本文介绍了计算时间序列中的连续值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是我第一次在这里问一个问题,所以我希望我会做对的!
It's my first time asking a question here, so I hope I will do it right !
我有一个熊猫数据框:
df2.data
Out[66]:
date
2016-01-02 0.0
2016-01-03 1.0
2016-01-04 1.0
2016-01-05 1.0
2016-01-06 0.0
2016-01-07 0.0
2016-01-08 1.0
2016-01-09 2.0
2016-01-10 1.0
2016-01-11 0.0
Name: data, dtype: float64
我想要以下结果:
data trend trend_type
date
2016-01-02 0.0 0 0
2016-01-03 1.0 0 0
2016-01-04 1.0 1 1
2016-01-05 1.0 2 1
2016-01-06 0.0 0 0
2016-01-07 0.0 1 0
2016-01-08 1.0 0 0
2016-01-09 2.0 0 0
2016-01-10 1.0 0 0
2016-01-11 0.0 0 0
我的问题与到目前为止,我设法掌握了趋势,但是效率不够高(对于750行数据帧,大约需要8秒)
So far, I managed to get the trend, but it is not efficient enough (about 8 sec for a 750 rows dataframe)
df['grp'] = (df.close.diff(1) == 0).astype('int')
df['trend'] = 0
start_time = time.time()
for i in range(2, len(df['grp'])):
if df.grp.iloc[i] == 1:
df['trend'].iloc[i] = df['trend'].iloc[i-1] + 1
推荐答案
第1步
要获取趋势
,请执行 groupby
+ cumcount
-
df['trend'] = df.data.groupby(df.data.ne(df.data.shift()).cumsum()).cumcount()
df
data trend
2016-01-02 0.0 0
2016-01-03 1.0 0
2016-01-04 1.0 1
2016-01-05 1.0 2
2016-01-06 0.0 0
2016-01-07 0.0 1
2016-01-08 1.0 0
2016-01-09 2.0 0
2016-01-10 1.0 0
2016-01-11 0.0 0
第2步
(IIUC),以获取 trend_type
,比较连续的行并进行分配.
Step 2
(IIUC), to get trend_type
, compare consecutive rows and assign.
df['trend_type'] = 0
m = df.data.eq(df.data.shift())
df.loc[m, 'trend_type'] = df.loc[m, 'data']
df
data trend trend_type
2016-01-02 0.0 0 0.0
2016-01-03 1.0 0 0.0
2016-01-04 1.0 1 1.0
2016-01-05 1.0 2 1.0
2016-01-06 0.0 0 0.0
2016-01-07 0.0 1 0.0
2016-01-08 1.0 0 0.0
2016-01-09 2.0 0 0.0
2016-01-10 1.0 0 0.0
2016-01-11 0.0 0 0.0
这篇关于计算时间序列中的连续值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文