滞后面板数据与数据表 [英] lagging panel data with data.table

查看：389 发布时间：2017/3/12 11:00:38 r time-series data.table

本文介绍了滞后面板数据与数据表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前使用 data.table 以下列方式滞后面板数据：

  require（data.table）
x < -  data.table（id = 1：10，t = rep（1:10，each = 10），v = 1：100）
 setkey（x，id，t）＃以使事情按增加的顺序
x [，lag_v：= c（NA，v [1：（length（v）-1）]），by = id]

我想知道是否有更好的方法来做到这一点？我在网上找到了关于cross-join的东西，这是有道理的。但是，交叉连接会为大型数据集生成相当大的 data.table ，所以我不太愿意使用它。

解决方案

我不知道这是否与你的方法有很大的不同，但你可以使用事实 x 由 id

键入

  x [J（1:10），lag_v ：= c（NA，head（v，-1））]

$ b

或者，使用的事实， c> t （不要使用函数作为变量名称！）是时间id
x < - data.table（id = 1:10，t = rep（1:10，each = 10），v = 1：100） setkey（x，t） J（setdiff（x [，unique（t）]，1）） x [replacement，lag_v：= x [replaced，v] [，v]] pre>

但是，使用双连接似乎效率低下

I currently lag panel data using data.table in the following manner:
require(data.table) x <- data.table(id=1:10, t=rep(1:10, each=10), v=1:100) setkey(x, id, t) #so that things are in increasing order x[,lag_v:=c(NA, v[1:(length(v)-1)]),by=id]
I am wondering if there is a better way to do this? I had found something online about cross-join, which makes sense. However, a cross-join would generate a fairly large data.table for a large dataset so I am hesitant to use it.
解决方案
I'm not sure this is that much different from your approach, but you can use the fact that x is keyed by id
x[J(1:10), lag_v := c(NA,head(v, -1)) ]
I have not tested whether this is faster than by, especially if it is already keyed.

Or, using the fact that t (don't use functions as variable names!) is the time id
x <- data.table(id=1:10, t=rep(1:10, each=10), v=1:100) setkey(x, t) replacing <- J(setdiff(x[, unique(t)],1)) x[replacing, lag_v := x[replacing, v][,v]]
but again, using a double join here seems inefficient

这篇关于滞后面板数据与数据表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

滞后面板数据与数据表 [英] lagging panel data with data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

滞后面板数据与数据表 [英] lagging panel data with data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭