通过在data.table中的时间间隔上的逻辑子分组定义变量 [英] Defining variable by logical subseting on time interval in data.table

查看:114
本文介绍了通过在data.table中的时间间隔上的逻辑子分组定义变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 data.table ,如下所示:

    id event state      time
 1:  A     0  NULL 0.8998250
 2:  A     1  NULL 1.1459127
 3:  A     0  NULL 1.1879722
 4:  A     2  NULL 1.5158930
 5:  A     0  NULL 2.4703966
 6:  B     0  NULL 0.8895393
 7:  B     1  NULL 1.5823427
 8:  B     2  NULL 2.2228495
 9:  B     0  NULL 3.2171193
10:  B     0  NULL 3.8728251
11:  C     1  NULL 0.7085305
12:  C     0  NULL 1.2525965
13:  C     2  NULL 1.8467385
14:  C     0  NULL 2.1358983
15:  C     0  NULL 2.2830119

我想给变量 state 事件1和事件2之间的行的值1.对于每个 id event = 1 allways在 event = 2 之前。

I want to give the variable state the value 1 for the rows between event 1 and event 2. The two events occure only once for each id and event=1 allways comes before event=2.

以下代码将上述data.table ,

The following code genarates the above data.table,

library(data.table)

# Defining variabels and data.table
id <- rep(LETTERS[1:3],each=5)
set.seed(123)
event <- c(sample(c(0,1),2,F),sample(c(0,0,2),3,F),
           sample(c(0,1),2,F),sample(c(0,0,2),3,F),
           sample(c(0,1),2,F),sample(c(0,0,2),3,F))
state <- "NULL"
time <- c(apply(matrix(runif(3*5),5,3),2,cumsum))
DT <- data.table(id,event,state,time) 
DT

我已经尝试下面的代码将值1分配给两个状态变量时间点 event == 1 event == 2

and I have tried the code below to assign the value 1 to the state variable between the two time points of event==1 and event==2.

DT[time>=time[event==1] & time<=time[event==2],state:="1",by=id]

但这会生成以下输出:

    id event state      time
 1:  A     0  NULL 0.8998250
 2:  A     1  NULL 1.1459127
 3:  A     0     1 1.1879722
 4:  A     2     1 1.5158930
 5:  A     0  NULL 2.4703966
 6:  B     0     1 0.8895393
 7:  B     1  NULL 1.5823427
 8:  B     2     1 2.2228495
 9:  B     0  NULL 3.2171193
10:  B     0  NULL 3.8728251
11:  C     1  NULL 0.7085305
12:  C     0     1 1.2525965
13:  C     2  NULL 1.8467385
14:  C     0     1 2.1358983
15:  C     0  NULL 2.2830119

其中 state = 1 显然放在data.table中错误的位置。我不知道什么data.table正在做。你可以看到为什么data.table是这样的行为,是否有一个漂亮的解决方案,我的问题?

Where the state=1's are clearly placed in the wrong places in the data.table. I can't figure out what data.table is doing. Can you see why data.table is behaving in this way and is there a nifty solutions to my problem?

推荐答案

DT[,state:= ifelse(time>=time[event==1] & time<=time[event==2],1,state),by=id]

#    id event state      time
# 1:  A     0  NULL 0.8998250
# 2:  A     1     1 1.1459127
# 3:  A     0     1 1.1879722
# 4:  A     2     1 1.5158930
# 5:  A     0  NULL 2.4703966
# 6:  B     0  NULL 0.8895393
# 7:  B     1     1 1.5823427
# 8:  B     2     1 2.2228495
# 9:  B     0  NULL 3.2171193
#10:  B     0  NULL 3.8728251
#11:  C     1     1 0.7085305
#12:  C     0     1 1.2525965
#13:  C     2     1 1.8467385
#14:  C     0  NULL 2.1358983
#15:  C     0  NULL 2.2830119

这篇关于通过在data.table中的时间间隔上的逻辑子分组定义变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆