时间序列按事件归一化 [英] Time series normalization by event

查看:297
本文介绍了时间序列按事件归一化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个python字典,如下所示,对于每种产品,键是时间戳,值是该时间戳下的产品价格.

Suppose I've a python dict as follows, for each product, key is timestamp and value is price of product at that timestamp.

data_dict = {
'product_1' : {1: 415, 2: 550, 3: 0,   4: 550, 5: 600},
'product_2' : {1: 400, 2: 300, 3: 300, 4: 0,   5: 300},
'product_3' : {1: 500, 2: 400, 3: 0,   4: 500, 5: 500},
'product_4' : {1: 0,   2: 200, 3: 200, 4: 300, 5: 300}
 }

在时间序列分析中,很常见的是通过某个事件将许多时间序列重新归一化,假设我们将事件视为产品免费的时间戳.我想要一个具有这种结构的表

It's very common in timeseries analysis to be renormalizing many timeseries by some event, assume we consider the event as the timestamp when the product is free. I would like to get a table that's of this structure

           | -3  | -2  | -1  | 0 | +1  | +2  | +3  | +4  |
---------------------------------------------------------
product_1  | NA  | 415 | 550 | 0 | 550 | 600 | NA  | NA  |
product_2  | 400 | 300 | 300 | 0 | 300 | NA  | NA  | NA  |
product_3  | NA  | 500 | 400 | 0 | 500 | 500 | NA  | NA  |
product_4  | NA  | NA  | NA  | 0 | 200 | 200 | 300 | 300 |

是否有某种方法可以使用Pandas for python轻松地做到这一点?我敢肯定,很多数据科学专家在某个时候不得不做类似的事情.如果没有的话,不胜感激,熊猫家伙们将来可以为这种功能添加一些功能!同时,有什么建议可以解决这个问题吗?

Is there some way to do this easily using pandas for python? I'm sure tons of data-science guys have had to do something similar at some point. If not, would really appreciate if pandas guys could add some functionality for something like this in future! In mean time, any suggestions how to go about this?

推荐答案

您可以使用

You may use .apply method, but that tends to be in-efficient if you have many columns;

所以从这个框架开始:

>>> df
   product_1  product_2  product_3  product_4
1        415        400        500          0
2        550        300        400        200
3          0        300          0        200
4        550          0        500        300
5        600        300        500        300

您定义的同步功能如下:

you define a synchronizing function as in:

>>> def sync(ts):
...     vals = ts.values
...     n, k = len(vals), np.where(vals == 0)[0][0]
...     return Series(vals, np.arange(-k, n - k))

并将其逐列应用:

>>> df.apply(sync).T
            -3   -2   -1   0    1    2    3    4
product_1  NaN  415  550   0  550  600  NaN  NaN
product_2  400  300  300   0  300  NaN  NaN  NaN
product_3  NaN  500  400   0  500  500  NaN  NaN
product_4  NaN  NaN  NaN   0  200  200  300  300

.T最后进行转置.

这篇关于时间序列按事件归一化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆