pandas 滚动中值以获取重复的时间序列数据 [英] Pandas rolling median for duplicate time series data
问题描述
我看到Pandas不允许重复的时间序列索引( https://github.com/pydata/pandas/issues/643 ),但将会是即将添加.我想知道是否存在一种通过多索引标签/列将滚动窗口均值应用于重复时间的数据集的好方法
I see that Pandas does not allow duplicate time series indexes yet (https://github.com/pydata/pandas/issues/643), but will be added soon. I am wondering if there is a good way to apply rolling window means to a dataset with duplicate times by a multi-index tag/column
基本上,我有一个csv的无序事件,它由纪元时间,分层标记(tag1,tag2)和所用时间组成.一个小样本:
Basically I have a csv of non-ordered events that consist of epochtime, hierarchical tags (tag1, tag2), and time taken. A small sample:
epochTimeMS,event,tag,timeTakenMS
1331782842801,event1,tag1,16
1331782841535,event1,tag2,1278
1331782842801,event1,tag1,17
1331782842381,event2,tag1,436
我想做的是通过事件和事件+标记使用不同的ms窗口进行构建和图形滚动.这似乎应该在Pandas中完成,但是不确定我是否需要等到重复的时间序列索引首次出现.现在有任何关于黑客入侵的想法吗?
What I want to do is build and graph rolling means with varying ms windows, by event and event+tag. This seems like it should be accomplished in Pandas, but not sure if I will need to wait until the duplicate time-series indexes first. Any thoughts on hacking this in place now?
推荐答案
目前没有什么可以阻止您的事情:
There's nothing really to stop you right now:
In [17]: idf = df.set_index(['tag', 'epochTimeMS'], verify_integrity=False).sort_index()
In [18]: idf
Out[18]:
event timeTakenMS
tag epochTimeMS
tag1 1331782842381 event2 436
1331782842801 event1 16
1331782842801 event1 17
tag2 1331782841535 event1 1278
In [20]: idf.ix['tag1']
Out[20]:
event timeTakenMS
epochTimeMS
1331782842381 event2 436
1331782842801 event1 16
1331782842801 event1 17
通过时间戳
访问特定值将导致异常(如您所述,这将得到改善),但是您当然可以使用数据.现在,如果您想要定长(时空)窗口,目前还不太支持,但是我在这里创建了一个问题:
Accessing specific values by timestamp will cause an exception (this is going to be improved, as you mention), but you can certainly work with the data. Now, if you want a fixed-length (in time space) window, that's not supported very well yet but I created an issue here:
https://github.com/pydata/pandas/issues/936
如果您可以在邮件列表中说出您的应用程序中的API要求,这对我和他们都将有所帮助,因为我们现在正在积极地开发时间序列功能.
If you could speak up on the mailing list about your API requirements in your application it would be helpful for me and the guys since we're actively working on the time series capabilities right now.
这篇关于 pandas 滚动中值以获取重复的时间序列数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!