带有 data.table 的滞后面板数据 [英] lagging panel data with data.table

查看：14 发布时间：2022/1/11 9:51:50 r time-series data.table

本文介绍了带有 data.table 的滞后面板数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前使用 data.table 以下列方式滞后面板数据:

I currently lag panel data using data.table in the following manner:

require(data.table)
x <- data.table(id=1:10, t=rep(1:10, each=10), v=1:100)
setkey(x, id, t) #so that things are in increasing order
x[,lag_v:=c(NA, v[1:(length(v)-1)]),by=id]

我想知道是否有更好的方法来做到这一点?我在网上找到了一些关于交叉连接的东西，这是有道理的.但是，交叉连接会为大型数据集生成一个相当大的 data.table，所以我不太愿意使用它.

I am wondering if there is a better way to do this? I had found something online about cross-join, which makes sense. However, a cross-join would generate a fairly large data.table for a large dataset so I am hesitant to use it.

推荐答案

我不确定这与您的方法有多大不同，但您可以使用 x 由 <代码>id

I'm not sure this is that much different from your approach, but you can use the fact that x is keyed by id

x[J(1:10), lag_v := c(NA,head(v, -1)) ]

我还没有测试过这是否比 by 快，尤其是在它已经被键入的情况下.

I have not tested whether this is faster than by, especially if it is already keyed.

或者，使用 t (不要使用函数作为变量名！)是时间 id 的事实

Or, using the fact that t (don't use functions as variable names!) is the time id

x <- data.table(id=1:10, t=rep(1:10, each=10), v=1:100)
setkey(x, t)
replacing <- J(setdiff(x[, unique(t)],1))
x[replacing, lag_v := x[replacing, v][,v]]

但同样，在这里使用双连接似乎效率低下

but again, using a double join here seems inefficient

这篇关于带有 data.table 的滞后面板数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

带有 data.table 的滞后面板数据 [英] lagging panel data with data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

带有 data.table 的滞后面板数据 [英] lagging panel data with data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭