对不均匀间隔的数据使用shift() [英] Using shift() with unevenly spaced data

查看:79
本文介绍了对不均匀间隔的数据使用shift()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

希望这个例子能说明一切.我想用shift()创建'lagval',但是如果缺少前一年,则需要使用nanval.

Hopefully this example speaks for itself. I want to create 'lagval' with shift() but need it to be nan if the prior year is missing.

df = DataFrame( { 'yr' : [2007,2008,2009,2011,2012],
                  'val': np.random.randn(5) } )

所需的输出(延迟时间):

Desired output (lagval):

In [1118]: df
Out[1118]: 
        val    yr    lagval
0 -0.978139  2007       NaN
1  0.117912  2008 -0.978139
2 -1.031884  2009  0.117912
3  0.606856  2011       NaN
4 -0.200864  2012  0.606856

我对此有一个不错的解决方案(作为答案发布),但是我正在寻找替代方案.我花了一些时间查看所有时间序列函数,但在这里看起来有些过头了.看来我最终将把年转换为真实的时间戳,重新采样,转换,然后丢弃缺失的值.但是也许有一种更简单的方法?

I have a decent solution for this (posted as an answer), but am looking for alternatives. I have spent some time looking at all the time series functions but that seems like overkill here. It seems like I would end up converting year to a true timestamp, resampling, shifting, and then dropping missing values. But maybe there is a simpler way?

推荐答案

对于它的价值,这是一个时序解决方案,显然需要更多代码.

For what it's worth, here's a time-series solution, which obviously takes a bit more code.

df = df.set_index(df['yr'].apply(lambda x: datetime.datetime(x, 1, 1)))
df = df.resample('A').mean()

df['lagval'] = df['val'].shift(1)
df = df[pd.notnull(df['yr'])]

我不熟悉Stata,但只是浏览文档,听起来tsset做类似的事情(将数据调整为指定的频率)?

I'm not familiar with Stata, but just skimming the docs, it sounds like tsset does something similar (conforming the data to a specified frequency)?

这篇关于对不均匀间隔的数据使用shift()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆