滞后面板数据与数据表 [英] lagging panel data with data.table

查看:389
本文介绍了滞后面板数据与数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前使用 data.table 以下列方式滞后面板数据:

  require(data.table)
x < - data.table(id = 1:10,t = rep(1:10,each = 10),v = 1:100)
setkey(x,id,t)#以使事情按增加的顺序
x [,lag_v:= c(NA,v [1:(length(v)-1)]),by = id]

我想知道是否有更好的方法来做到这一点?我在网上找到了关于cross-join的东西,这是有道理的。但是,交叉连接会为大型数据集生成相当大的 data.table ,所以我不太愿意使用它。

解决方案

我不知道这是否与你的方法有很大的不同,但你可以使用事实 x id

键入

  x [J(1:10),lag_v := c(NA,head(v,-1))] 


$ b

或者,使用的事实, c> t (不要使用函数作为变量名称!)是时间id

  x < -  data.table(id = 1:10,t = rep(1:10,each = 10),v = 1:100)
setkey(x,t)
J(setdiff(x [,unique(t)],1))
x [replacement,lag_v:= x [replaced,v] [,v]]
pre>

但是,使用双连接似乎效率低下


I currently lag panel data using data.table in the following manner:

require(data.table)
x <- data.table(id=1:10, t=rep(1:10, each=10), v=1:100)
setkey(x, id, t) #so that things are in increasing order
x[,lag_v:=c(NA, v[1:(length(v)-1)]),by=id]

I am wondering if there is a better way to do this? I had found something online about cross-join, which makes sense. However, a cross-join would generate a fairly large data.table for a large dataset so I am hesitant to use it.

解决方案

I'm not sure this is that much different from your approach, but you can use the fact that x is keyed by id

x[J(1:10), lag_v := c(NA,head(v, -1)) ]

I have not tested whether this is faster than by, especially if it is already keyed.

Or, using the fact that t (don't use functions as variable names!) is the time id

x <- data.table(id=1:10, t=rep(1:10, each=10), v=1:100)
setkey(x, t)
replacing <- J(setdiff(x[, unique(t)],1))
x[replacing, lag_v := x[replacing, v][,v]]

but again, using a double join here seems inefficient

这篇关于滞后面板数据与数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆