用给定的timedelta重新采样时间 [英] Resampling timeseries with a given timedelta

查看:147
本文介绍了用给定的timedelta重新采样时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用熊猫来构建和处理数据。这是我的DataFrame:





我想对时间序列数据进行重新采样,对于每个ID(命名为3),从开始到结束(begin_time / end_time)都有所有比特率分数。例如,对于第一行,我想要有所有秒,从2016-07-08 02:17:42到07-07-08 02:17:55,具有相同的比特率分数,当然同样的ID 。如下所示:



例如,给出:

  df = pd.DataFrame(
{'Id':['CODI126640013.ts','CODI126622312.ts'],
'beginning_time':['2016-07-08 02:17:42',' 2016-07-08 02:05:35'],
'end_time':['2016-07-08 02:17:55','2016-07-08 02:26:11'],
'bitrate':['3750000','3750000']})





而且我想要第一个行:





同样的事情, ..
所以objectif是在开始和结束时间之间重新采样deltaTime,bitrat e分数必须是一样的。



我正在尝试这段代码:

 code> df ['new_beginning_time'] = pd.to_datetime(df ['begin_time'])
df.set_index('new_beginning_time')。groupby('Id',group_keys = False).apply(lambda df:df.resample('S')。ffill())。reset_index()

但是这个上下文,它没有工作!有任何想法吗 ?非常感谢!

解决方案

您可以使用 fusion resample - 0.18.1版本的熊猫

  df.beginning_time = pd.to_datetime(df.beginning_time)
df。 end_time = pd.to_datetime(df.end_time)
df = pd.melt(df,id_vars = ['Id','bitrate'],value_name ='dates')drop('variable',axis = 1 )
df.set_index('dates',inplace = True)
print(df)
Id比特率
日期
2016-07-08 02:17:42 CODI126640013.ts 3750000
2016-07-08 02:05:35 CODI126622312.ts 3750000
2016-07-08 02:17:55 CODI126640013.ts 3750000
2016-07-08 02 :26:11 CODI126622312.ts 37 50000

print(df.groupby('Id')。resample('1S')。ffill())
Id比特率
Id日期
CODI126622312.ts 2016-07-08 02:05:35 CODI126622312.ts 3750000
2016-07-08 02:05:36 CODI126622312.ts 3750000
2016-07-08 02:05:37 CODI126622312.ts 3750000
2016-07-08 02:05:38 CODI126622312.ts 3750000
2016-07-08 02:05:39 CODI126622312.ts 3750000
2016-07-08 02:05:40 CODI126622312.ts 3750000
2016-07-08 02:05:41 CODI126622312.ts 3750000
2016-07-08 02:05:42 CODI126622312.ts 3750000
2016-07-08 02 :05:43 CODI126622312.ts 3750000
2016-07-08 02:05:44 CODI126622312.ts 3750000
2016-07-08 02:05:45 CODI126622312.ts 3750000
2016- 07- 08 02:05:46 CODI126622312.ts 3750000
2016-07-08 02:05:47 CODI126622312.ts 3750000
...
...


I am using Pandas to structure and process Data. This is my DataFrame:

I want to do a resampling of time-series data, and have, for every ID (named here "3"), all bitrate scores, from beginning to end (beginning_time / end_time). For exemple, for the first row, I want to have all seconds, from 2016-07-08 02:17:42 to 2016-07-08 02:17:55, with the same bitrate score, and the same ID of course. Something like this :

For example, given :

df = pd.DataFrame(
{'Id' : ['CODI126640013.ts', 'CODI126622312.ts'],
 'beginning_time':['2016-07-08 02:17:42', '2016-07-08 02:05:35'], 
 'end_time' :['2016-07-08 02:17:55', '2016-07-08 02:26:11'],
 'bitrate': ['3750000', '3750000']})

which gives :

And I want to have for the first row :

Same thing for the secend row.. So the objectif is to resample the deltaTime between the beginning and the end times, the bitrate score must be the same of course.

I'm trying this code:

df['new_beginning_time'] = pd.to_datetime(df['beginning_time'])
df.set_index('new_beginning_time').groupby('Id', group_keys=False).apply(lambda df: df.resample('S').ffill()).reset_index()

But in this context, it didn't work ! Any ideas ? Thank you very much !

解决方案

You can use melt with resample - 0.18.1 version of pandas:

df.beginning_time = pd.to_datetime(df.beginning_time)
df.end_time = pd.to_datetime(df.end_time)
df = pd.melt(df, id_vars=['Id','bitrate'], value_name='dates').drop('variable', axis=1)
df.set_index('dates', inplace=True)
print(df)
                                   Id  bitrate
dates                                         
2016-07-08 02:17:42  CODI126640013.ts  3750000
2016-07-08 02:05:35  CODI126622312.ts  3750000
2016-07-08 02:17:55  CODI126640013.ts  3750000
2016-07-08 02:26:11  CODI126622312.ts  3750000

print (df.groupby('Id').resample('1S').ffill())
                                                    Id  bitrate
Id               dates                                         
CODI126622312.ts 2016-07-08 02:05:35  CODI126622312.ts  3750000
                 2016-07-08 02:05:36  CODI126622312.ts  3750000
                 2016-07-08 02:05:37  CODI126622312.ts  3750000
                 2016-07-08 02:05:38  CODI126622312.ts  3750000
                 2016-07-08 02:05:39  CODI126622312.ts  3750000
                 2016-07-08 02:05:40  CODI126622312.ts  3750000
                 2016-07-08 02:05:41  CODI126622312.ts  3750000
                 2016-07-08 02:05:42  CODI126622312.ts  3750000
                 2016-07-08 02:05:43  CODI126622312.ts  3750000
                 2016-07-08 02:05:44  CODI126622312.ts  3750000
                 2016-07-08 02:05:45  CODI126622312.ts  3750000
                 2016-07-08 02:05:46  CODI126622312.ts  3750000
                 2016-07-08 02:05:47  CODI126622312.ts  3750000
                 ...
                 ...

这篇关于用给定的timedelta重新采样时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆