通过在data.table中的时间间隔上的逻辑子分组定义变量 [英] Defining variable by logical subseting on time interval in data.table
问题描述
我有一个 data.table
,如下所示:
id event state time
1: A 0 NULL 0.8998250
2: A 1 NULL 1.1459127
3: A 0 NULL 1.1879722
4: A 2 NULL 1.5158930
5: A 0 NULL 2.4703966
6: B 0 NULL 0.8895393
7: B 1 NULL 1.5823427
8: B 2 NULL 2.2228495
9: B 0 NULL 3.2171193
10: B 0 NULL 3.8728251
11: C 1 NULL 0.7085305
12: C 0 NULL 1.2525965
13: C 2 NULL 1.8467385
14: C 0 NULL 2.1358983
15: C 0 NULL 2.2830119
我想给变量 state
事件1和事件2之间的行的值1.对于每个 id
和 event = 1
allways在 event = 2
之前。
I want to give the variable state
the value 1 for the rows between event 1 and event 2. The two events occure only once for each id
and event=1
allways comes before event=2
.
以下代码将上述data.table ,
The following code genarates the above data.table,
library(data.table)
# Defining variabels and data.table
id <- rep(LETTERS[1:3],each=5)
set.seed(123)
event <- c(sample(c(0,1),2,F),sample(c(0,0,2),3,F),
sample(c(0,1),2,F),sample(c(0,0,2),3,F),
sample(c(0,1),2,F),sample(c(0,0,2),3,F))
state <- "NULL"
time <- c(apply(matrix(runif(3*5),5,3),2,cumsum))
DT <- data.table(id,event,state,time)
DT
我已经尝试下面的代码将值1分配给两个状态变量时间点 event == 1
和 event == 2
。
and I have tried the code below to assign the value 1 to the state variable between the two time points of event==1
and event==2
.
DT[time>=time[event==1] & time<=time[event==2],state:="1",by=id]
但这会生成以下输出:
id event state time
1: A 0 NULL 0.8998250
2: A 1 NULL 1.1459127
3: A 0 1 1.1879722
4: A 2 1 1.5158930
5: A 0 NULL 2.4703966
6: B 0 1 0.8895393
7: B 1 NULL 1.5823427
8: B 2 1 2.2228495
9: B 0 NULL 3.2171193
10: B 0 NULL 3.8728251
11: C 1 NULL 0.7085305
12: C 0 1 1.2525965
13: C 2 NULL 1.8467385
14: C 0 1 2.1358983
15: C 0 NULL 2.2830119
其中 state = 1
显然放在data.table中错误的位置。我不知道什么data.table正在做。你可以看到为什么data.table是这样的行为,是否有一个漂亮的解决方案,我的问题?
Where the state=1
's are clearly placed in the wrong places in the data.table. I can't figure out what data.table is doing. Can you see why data.table is behaving in this way and is there a nifty solutions to my problem?
推荐答案
DT[,state:= ifelse(time>=time[event==1] & time<=time[event==2],1,state),by=id]
# id event state time
# 1: A 0 NULL 0.8998250
# 2: A 1 1 1.1459127
# 3: A 0 1 1.1879722
# 4: A 2 1 1.5158930
# 5: A 0 NULL 2.4703966
# 6: B 0 NULL 0.8895393
# 7: B 1 1 1.5823427
# 8: B 2 1 2.2228495
# 9: B 0 NULL 3.2171193
#10: B 0 NULL 3.8728251
#11: C 1 1 0.7085305
#12: C 0 1 1.2525965
#13: C 2 1 1.8467385
#14: C 0 NULL 2.1358983
#15: C 0 NULL 2.2830119
这篇关于通过在data.table中的时间间隔上的逻辑子分组定义变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!