pandas 滚动中值以获取重复的时间序列数据 [英] Pandas rolling median for duplicate time series data

查看:65
本文介绍了 pandas 滚动中值以获取重复的时间序列数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到Pandas不允许重复的时间序列索引( https://github.com/pydata/pandas/issues/643 ),但将会是即将添加.我想知道是否存在一种通过多索引标签/列将滚动窗口均值应用于重复时间的数据集的好方法

I see that Pandas does not allow duplicate time series indexes yet (https://github.com/pydata/pandas/issues/643), but will be added soon. I am wondering if there is a good way to apply rolling window means to a dataset with duplicate times by a multi-index tag/column

基本上,我有一个csv的无序事件,它由纪元时间,分层标记(tag1,tag2)和所用时间组成.一个小样本:

Basically I have a csv of non-ordered events that consist of epochtime, hierarchical tags (tag1, tag2), and time taken. A small sample:

 epochTimeMS,event,tag,timeTakenMS
 1331782842801,event1,tag1,16
 1331782841535,event1,tag2,1278
 1331782842801,event1,tag1,17
 1331782842381,event2,tag1,436

我想做的是通过事件和事件+标记使用不同的ms窗口进行构建和图形滚动.这似乎应该在Pandas中完成,但是不确定我是否需要等到重复的时间序列索引首次出现.现在有任何关于黑客入侵的想法吗?

What I want to do is build and graph rolling means with varying ms windows, by event and event+tag. This seems like it should be accomplished in Pandas, but not sure if I will need to wait until the duplicate time-series indexes first. Any thoughts on hacking this in place now?

推荐答案

目前没有什么可以阻止您的事情:

There's nothing really to stop you right now:

In [17]: idf = df.set_index(['tag', 'epochTimeMS'], verify_integrity=False).sort_index()

In [18]: idf
Out[18]: 
                     event  timeTakenMS
tag  epochTimeMS                       
tag1 1331782842381  event2          436
     1331782842801  event1           16
     1331782842801  event1           17
tag2 1331782841535  event1         1278

In [20]: idf.ix['tag1']
Out[20]: 
                event  timeTakenMS
epochTimeMS                       
1331782842381  event2          436
1331782842801  event1           16
1331782842801  event1           17

通过时间戳

访问特定值将导致异常(如您所述,这将得到改善),但是您当然可以使用数据.现在,如果您想要定长(时空)窗口,目前还不太支持,但是我在这里创建了一个问题:

Accessing specific values by timestamp will cause an exception (this is going to be improved, as you mention), but you can certainly work with the data. Now, if you want a fixed-length (in time space) window, that's not supported very well yet but I created an issue here:

https://github.com/pydata/pandas/issues/936

如果您可以在邮件列表中说出您的应用程序中的API要求,这对我和他们都将有所帮助,因为我们现在正在积极地开发时间序列功能.

If you could speak up on the mailing list about your API requirements in your application it would be helpful for me and the guys since we're actively working on the time series capabilities right now.

这篇关于 pandas 滚动中值以获取重复的时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆