R数据表条件在组内,但是在组中第一次实例记录 [英] R data.table condition within group, but recorded at first instance in group
本文介绍了R数据表条件在组内,但是在组中第一次实例记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有数据看起来有点像这样:
df< - data.frame(ID = c (1,4),rep(2,2),rep(3,2),4),TYPE = c(1,3,2,4,1,2,2,3,2),
SEQUENCE = c(seq(1,4),1,2,1,2,1))
ID类型序列
1 1 1
1 3 2
1 2 3
1 4 4
2 1 1
2 2 2
3 2 1
3 3 2
4 2 1
我知道需要检查每个ID块(二进制)中是否存在某种类型, $ b回答每个块的第一个记录(SEQUENCE == 1)。
到目前为止,我想出的最好的是他们在他们所在的行,例如
library(data.table)
/ pre>
DT< - data.table(df)
DT $ A [DT $ TYPE == 1]< - 1
DT $ B [DT $ TYPE == 2]< - 1
DT $ C [DT $ TYPE == 3] - 1
DT $ D [DT $ TYPE == 4]< - 1
DT [is.na(DT)] < - 0
结果:
ID TYPE SEQUENCE ABCD
1 1 1 1 0 0 0
1 3 2 0 0 1 0
1 2 3 0 1 0 0
1 4 4 0 0 0 1
2 1 1 1 0 0 0
2 2 2 0 1 0 0
3 2 1 0 1 0 0
3 3 2 0 0 1 0
4 2 1 0 1 0 0
但是,结果应如下所示:
ID类型序列ABCD
1 1 1 1 1 1 1
1 3 2 0 0 0 0
1 2 3 0 0 0 0
1 4 4 0 0 0 0
2 1 1 1 1 0 0
2 2 2 0 0 0 0
3 2 1 0 1 1 0
3 3 2 0 0 0 0
4 2 1 0 1 0 0
我假设这可以用
data.table
,但我没有找到正确的语法。解决方案这使得data.table的一个副本:
DT [,FAC:= factor(TYPE,labels = LETTERS [1: 4])]
DT <-dcast.data.table(DT,ID + TYPE + SEQUENCE_FAC,fun.aggregate = length)
DT [,LETTERS [1:4 ]:= lapply(.SD,
function(x)c(any(as.logical(x)),rep(0L,length(x)-1))),
.SDcols = LETTERS [1:4],by = ID]
#ID类型序列ABCD
#1:1 1 1 1 1 1 1
#2:1 2 3 0 0 0 0
#3:1 3 2 0 0 0 0
#4:1 4 4 0 0 0 0
#5:2 1 1 1 1 0 0
#6:2 2 2 0 0 0 0
#7:3 2 1 0 1 1 0
#8:3 3 2 0 0 0 0
#9:4 2 1 0 1 0 0
I have data that looks a bit like this:
df <- data.frame(ID=c(rep(1,4),rep(2,2),rep(3,2),4), TYPE=c(1,3,2,4,1,2,2,3,2), SEQUENCE=c(seq(1,4),1,2,1,2,1)) ID TYPE SEQUENCE 1 1 1 1 3 2 1 2 3 1 4 4 2 1 1 2 2 2 3 2 1 3 3 2 4 2 1
I know need to check if a certain type is present in each ID block (binary), but only record the answer in the first record per block (SEQUENCE == 1).
The best I came up with so far is coding them in the row they are present in, e.g.
library(data.table) DT <- data.table(df) DT$A[DT$TYPE==1] <- 1 DT$B[DT$TYPE==2] <- 1 DT$C[DT$TYPE==3] <- 1 DT$D[DT$TYPE==4] <- 1 DT[is.na(DT)] <- 0
RESULT:
ID TYPE SEQUENCE A B C D 1 1 1 1 0 0 0 1 3 2 0 0 1 0 1 2 3 0 1 0 0 1 4 4 0 0 0 1 2 1 1 1 0 0 0 2 2 2 0 1 0 0 3 2 1 0 1 0 0 3 3 2 0 0 1 0 4 2 1 0 1 0 0
However, the result should look like this:
ID TYPE SEQUENCE A B C D 1 1 1 1 1 1 1 1 3 2 0 0 0 0 1 2 3 0 0 0 0 1 4 4 0 0 0 0 2 1 1 1 1 0 0 2 2 2 0 0 0 0 3 2 1 0 1 1 0 3 3 2 0 0 0 0 4 2 1 0 1 0 0
I assume this can be done with
data.table
, but I haven't quite found the correct syntax.解决方案This makes one copy of the data.table:
DT[, FAC := factor(TYPE, labels=LETTERS[1:4])] DT <- dcast.data.table(DT, ID+TYPE+SEQUENCE~FAC, fun.aggregate=length) DT[,LETTERS[1:4] := lapply(.SD, function(x) c(any(as.logical(x)), rep(0L, length(x)-1))), .SDcols=LETTERS[1:4], by=ID] # ID TYPE SEQUENCE A B C D #1: 1 1 1 1 1 1 1 #2: 1 2 3 0 0 0 0 #3: 1 3 2 0 0 0 0 #4: 1 4 4 0 0 0 0 #5: 2 1 1 1 1 0 0 #6: 2 2 2 0 0 0 0 #7: 3 2 1 0 1 1 0 #8: 3 3 2 0 0 0 0 #9: 4 2 1 0 1 0 0
这篇关于R数据表条件在组内,但是在组中第一次实例记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文