动态子集数据表 [英] Dynamically subsetting a data table
问题描述
我有一个关于动态子集数据表的问题。我知道stackoverflow上有很多线程,它们的名称相似,但不幸的是,它们并没有引导我找到所需的解决方案。
I have a question concerning dynamically subsetting a data table. I know that there are numerous threads on stackoverflow which are denominated similarly but unfortunately they didn't lead me to the wanted solution.
示例数据集:
require(data.table)
dt <- data.table(date=c(rep(1,5),rep(2,5)),id=rep(1:5,2),var=c(1:10))
对于每个ID ,我想找到之前所有期间的所有其他 ID的子集。在示例数据集中,有5个ID和两个句点。如果在周期2中查看ID = 5,则对应的子集将是ID = {1,2,3,4)和date = 1的子集。在这个简单的数据集中,我当然可以手动进行编码:
For each ID I would like to find the subset of all other IDs of all periods before. In the example data set there are 5 IDs and two periods. If one looks at ID=5 in period 2 the corresponding subset would be that of ID={1,2,3,4) and date=1. In this simple data set I of course can code this by hand:
dt[,dt[-.I][date<2],by=id]
但是我想自动执行此操作。我尝试过
I however would like to do this automatically. I tried something like
dt[,dt[-.I][date < unique(dt$date[.I])],by=id]
但这不是
任何有用的评论都将受到赞赏!谢谢!
Any helpful comments are appreciated! Thanks!
推荐答案
我认为这是更快的解决方案:
I think this is the faster solution:
dta <- data.table(date=c(rep(1,5),rep(2,5)),id=rep(1:5,2),var=c(1:10))
dta[,dta[dta[.I]$id!=dta$id & dta[.I]$date>dta$date],by=list(id,date)]
任何关于如何使此代码更快的评论都受到高度赞赏。
Any comments on how to make this code even faster is highly appreciated.
这篇关于动态子集数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!