时间序列按事件归一化 [英] Time series normalization by event
问题描述
假设我有一个python字典,如下所示,对于每种产品,键是时间戳,值是该时间戳下的产品价格.
Suppose I've a python dict as follows, for each product, key is timestamp and value is price of product at that timestamp.
data_dict = {
'product_1' : {1: 415, 2: 550, 3: 0, 4: 550, 5: 600},
'product_2' : {1: 400, 2: 300, 3: 300, 4: 0, 5: 300},
'product_3' : {1: 500, 2: 400, 3: 0, 4: 500, 5: 500},
'product_4' : {1: 0, 2: 200, 3: 200, 4: 300, 5: 300}
}
在时间序列分析中,很常见的是通过某个事件将许多时间序列重新归一化,假设我们将事件视为产品免费的时间戳.我想要一个具有这种结构的表
It's very common in timeseries analysis to be renormalizing many timeseries by some event, assume we consider the event as the timestamp when the product is free. I would like to get a table that's of this structure
| -3 | -2 | -1 | 0 | +1 | +2 | +3 | +4 |
---------------------------------------------------------
product_1 | NA | 415 | 550 | 0 | 550 | 600 | NA | NA |
product_2 | 400 | 300 | 300 | 0 | 300 | NA | NA | NA |
product_3 | NA | 500 | 400 | 0 | 500 | 500 | NA | NA |
product_4 | NA | NA | NA | 0 | 200 | 200 | 300 | 300 |
是否有某种方法可以使用Pandas for python轻松地做到这一点?我敢肯定,很多数据科学专家在某个时候不得不做类似的事情.如果没有的话,不胜感激,熊猫家伙们将来可以为这种功能添加一些功能!同时,有什么建议可以解决这个问题吗?
Is there some way to do this easily using pandas for python? I'm sure tons of data-science guys have had to do something similar at some point. If not, would really appreciate if pandas guys could add some functionality for something like this in future! In mean time, any suggestions how to go about this?
推荐答案
You may use .apply
method, but that tends to be in-efficient if you have many columns;
所以从这个框架开始:
>>> df
product_1 product_2 product_3 product_4
1 415 400 500 0
2 550 300 400 200
3 0 300 0 200
4 550 0 500 300
5 600 300 500 300
您定义的同步功能如下:
you define a synchronizing function as in:
>>> def sync(ts):
... vals = ts.values
... n, k = len(vals), np.where(vals == 0)[0][0]
... return Series(vals, np.arange(-k, n - k))
并将其逐列应用:
>>> df.apply(sync).T
-3 -2 -1 0 1 2 3 4
product_1 NaN 415 550 0 550 600 NaN NaN
product_2 400 300 300 0 300 NaN NaN NaN
product_3 NaN 500 400 0 500 500 NaN NaN
product_4 NaN NaN NaN 0 200 200 300 300
.T
最后进行转置.
这篇关于时间序列按事件归一化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!