pandas 移动时间序列缺失值 [英] pandas shift time series with missing values

查看:95
本文介绍了 pandas 移动时间序列缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个时间序列,其中缺少一些条目,如下所示:

I have a times series with some missing entries, that looks like this:

date     value
---------------
2000       5
2001      10
2003      8
2004      72
2005      12
2007      13

我想为previous_value"创建一个列.但我只希望它显示连续几年的值.所以我希望它看起来像这样:

I would like to do create a column for the "previous_value". But I only want it to show values for consecutive years. So I want it to look like this:

date     value    previous_value
-------------------------------
2000       5        nan
2001      10         5
2003      8         nan
2004      72         8
2005      12        72
2007      13        nan

但是,仅将 Pandas shift 函数直接应用于列 'value' 会为 'time' = 2003 提供 'previous_value' = 10,而对于 'time' = 2007 将提供 'previous_value' = 12.

However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007.

在熊猫中处理这个问题的最优雅的方法是什么?(我不确定它是否像设置 'freq' 属性一样简单).

What's the most elegant way to deal with this in pandas? (I'm not sure if it's as easy as setting the 'freq' attribute).

推荐答案

In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007],
                              'value':[5,10,8,72,12,13] })

In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]

In [590]: df
Out[590]: 
   date  value  previous_value
0  2000      5             NaN
1  2001     10               5
2  2003      8             NaN
3  2004     72               8
4  2005     12              72
5  2007     13             NaN

另请参阅此处使用 resample() 的时间序列方法:对不均匀间隔的数据使用 shift()

Also see here for a time series approach using resample(): Using shift() with unevenly spaced data

这篇关于 pandas 移动时间序列缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆