R数据表条件在组内,但是在组中第一次实例记录 [英] R data.table condition within group, but recorded at first instance in group

查看:253
本文介绍了R数据表条件在组内,但是在组中第一次实例记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数据看起来有点像这样:

  df<  -  data.frame(ID = c (1,4),rep(2,2),rep(3,2),4),TYPE = c(1,3,2,4,1,2,2,3,2),
SEQUENCE = c(seq(1,4),1,2,1,2,1))

ID类型序列
1 1 1
1 3 2
1 2 3
1 4 4
2 1 1
2 2 2
3 2 1
3 3 2
4 2 1

我知道需要检查每个ID块(二进制)中是否存在某种类型, $ b回答每个块的第一个记录(SEQUENCE == 1)。



到目前为止,我想出的最好的是他们在他们所在的行,例如

  library(data.table)
DT< - data.table(df)
DT $ A [DT $ TYPE == 1]< - 1
DT $ B [DT $ TYPE == 2]< - 1
DT $ C [DT $ TYPE == 3] - 1
DT $ D [DT $ TYPE == 4]< - 1
DT [is.na(DT)] < - 0
/ pre>

结果:

  ID TYPE SEQUENCE ABCD 
1 1 1 1 0 0 0
1 3 2 0 0 1 0
1 2 3 0 1 0 0
1 4 4 0 0 0 1
2 1 1 1 0 0 0
2 2 2 0 1 0 0
3 2 1 0 1 0 0
3 3 2 0 0 1 0
4 2 1 0 1 0 0

但是,结果应如下所示:

  ID类型序列ABCD 
1 1 1 1 1 1 1
1 3 2 0 0 0 0
1 2 3 0 0 0 0
1 4 4 0 0 0 0
2 1 1 1 1 0 0
2 2 2 0 0 0 0
3 2 1 0 1 1 0
3 3 2 0 0 0 0
4 2 1 0 1 0 0

我假设这可以用 data.table ,但我没有找到正确的语法。

解决方案

这使得data.table的一个副本:

  DT [,FAC:= factor(TYPE,labels = LETTERS [1: 4])] 

DT <-dcast.data.table(DT,ID + TYPE + SEQUENCE_FAC,fun.aggregate = length)
DT [,LETTERS [1:4 ]:= lapply(.SD,
function(x)c(any(as.logical(x)),rep(0L,length(x)-1))),
.SDcols = LETTERS [1:4],by = ID]
#ID类型序列ABCD
#1:1 1 1 1 1 1 1
#2:1 2 3 0 0 0 0
#3:1 3 2 0 0 0 0
#4:1 4 4 0 0 0 0
#5:2 1 1 1 1 0 0
#6:2 2 2 0 0 0 0
#7:3 2 1 0 1 1 0
#8:3 3 2 0 0 0 0
#9:4 2 1 0 1 0 0


I have data that looks a bit like this:

df <- data.frame(ID=c(rep(1,4),rep(2,2),rep(3,2),4), TYPE=c(1,3,2,4,1,2,2,3,2),
                 SEQUENCE=c(seq(1,4),1,2,1,2,1))

ID  TYPE  SEQUENCE
1   1     1
1   3     2
1   2     3
1   4     4
2   1     1
2   2     2
3   2     1
3   3     2
4   2     1

I know need to check if a certain type is present in each ID block (binary), but only record the answer in the first record per block (SEQUENCE == 1).

The best I came up with so far is coding them in the row they are present in, e.g.

library(data.table)
DT <- data.table(df)
DT$A[DT$TYPE==1] <- 1
DT$B[DT$TYPE==2] <- 1
DT$C[DT$TYPE==3] <- 1
DT$D[DT$TYPE==4] <- 1
DT[is.na(DT)] <- 0

RESULT:

ID  TYPE  SEQUENCE  A B C D
1   1     1         1 0 0 0
1   3     2         0 0 1 0
1   2     3         0 1 0 0
1   4     4         0 0 0 1
2   1     1         1 0 0 0
2   2     2         0 1 0 0
3   2     1         0 1 0 0
3   3     2         0 0 1 0
4   2     1         0 1 0 0

However, the result should look like this:

ID  TYPE  SEQUENCE  A B C D
1   1     1         1 1 1 1
1   3     2         0 0 0 0
1   2     3         0 0 0 0
1   4     4         0 0 0 0
2   1     1         1 1 0 0
2   2     2         0 0 0 0
3   2     1         0 1 1 0
3   3     2         0 0 0 0
4   2     1         0 1 0 0

I assume this can be done with data.table, but I haven't quite found the correct syntax.

解决方案

This makes one copy of the data.table:

DT[, FAC := factor(TYPE, labels=LETTERS[1:4])]

DT <- dcast.data.table(DT, ID+TYPE+SEQUENCE~FAC, fun.aggregate=length)
DT[,LETTERS[1:4] := lapply(.SD, 
                           function(x) c(any(as.logical(x)), rep(0L, length(x)-1))),
   .SDcols=LETTERS[1:4], by=ID]
#   ID TYPE SEQUENCE A B C D
#1:  1    1        1 1 1 1 1
#2:  1    2        3 0 0 0 0
#3:  1    3        2 0 0 0 0
#4:  1    4        4 0 0 0 0
#5:  2    1        1 1 1 0 0
#6:  2    2        2 0 0 0 0
#7:  3    2        1 0 1 1 0
#8:  3    3        2 0 0 0 0
#9:  4    2        1 0 1 0 0

这篇关于R数据表条件在组内,但是在组中第一次实例记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆