在值变化前后计数,在组内,为每个独特的转变生成新变量 [英] counting after and before change in value, within groups, generating new variables for each unique shift

查看:25
本文介绍了在值变化前后计数,在组内,为每个独特的转变生成新变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在计算我的组中唯一值的出现次数,id.我正在查看 TF.当 TF 改变时,我想从那个点向前和向后计数.这个计数应该存储在一个新的变量 PM# 中,这样 PM# 就可以同时保存 中每个唯一移位的加号和减号TF.根据我收集的信息,我需要使用 rle,但我有点卡住了.

我制作了这个工作示例来说明我的问题.

我有这个数据

df <- 结构(列表(id = c(0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,7L, 7L, 7L, 7L), TF = c(NA, 0L, NA, 0L, 0L, 1L, 1L, 1L, NA, 0L,0L, NA, 0L, 0L, 0L, 1L, 1L, 1L, NA, NA, 0L, 0L, 1L, 0L, 0L, 1L,0L, 1L, 1L, 1L)), .Names = c("id", "TF"), class = "data.frame", row.names = c(NA,-30L))

这是我看到的数据

df[c(1:12,19:30),]#>编号TF#>1 0 不适用#>2 0 0#>3 0 不适用#>4 0 0#>5 0 0#>6 0 1#>7 0 1#>8 0 1#>9 0 不适用#>10 0 0#>11 0 0#>12 1 不适用#>19 1 不适用#>20 7 不适用#>21 7 0#>22 7 0#>23 7 1#>24 7 0#>25 7 0#>26 7 1#>27 7 0#>28 7 1#>29 7 1#>30 7 1

我已经开始使用 avecumsumrle,但还没有解决这个问题.

df$PM01 <- with(df, ifelse(is.na(TF), NA, 1))df$PM01 <- with(df, ave(PM01, TF, id, FUN=cumsum))与(df,tapply(TF,rep(rle(id)[[2]],rle(id)[[1]]),计数))

这就是我想要的,

dfa <- 结构(列表(id = c(0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,7L, 7L, 7L, 7L), TF = c(NA, 0L, NA, 0L, 0L, 1L, 1L, 1L, NA, 0L,0L, NA, 0L, 0L, 0L, 1L, 1L, 1L, NA, NA, 0L, 0L, 1L, 0L, 0L, 1L,0L, 1L, 1L, 1L), PM1 = c(NA, -3L, NA, -2L, -1L, 1L, 2L, 3L, NA,NA、NA、NA、-3L、-2L、-1L、1L、2L、3L、NA、NA、-2L、-1L、1L、NA, NA, NA, NA, NA, NA, NA), PM2 = c(NA, NA, NA, NA, NA, -3L,-2L、-1L、NA、1L、2L、NA、NA、NA、NA、NA、NA、NA、NA、NA、NA、NA, -1L, 1L, 2L, NA, NA, NA, NA, NA), PM3 = c(NA, NA, NA, NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA, NA, -2L, -1L, 1L, NA, NA, NA, NA), PM4 = c(NA, NA, NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA, NA, NA, NA, NA, -1L, 1L, NA, NA, NA), PM5 = c(NA, NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA, NA, NA, NA, NA, NA, NA, NA, -1L, 1L, 2L, 3L)), .Names = c("id","TF", "PM1", "PM2", "PM3", "PM4", "PM5"), class = "data.frame", row.names = c(NA,-30L))dfa[c(1:12,19:30),]#>id TF PM1 PM2 PM3 PM4 PM5#>1 0 NA NA NA NA NA NA#>2 0 0 -3 不适用 不适用 不适用 不适用#>3 0 NA NA NA NA NA NA#>4 0 0 -2 不适用 不适用 不适用 不适用#>5 0 0 -1 不适用 不适用 不适用 不适用#>6 0 1 1 -3 不适用 不适用 不适用#>7 0 1 2 -2 不适用 不适用 不适用#>8 0 1 3 -1 不适用 不适用 不适用#>9 0 NA NA NA NA NA NA#>10 0 0 不适用 1 不适用 不适用 不适用#>11 0 0 不适用 2 不适用 不适用 不适用#>12 1 NA NA NA NA NA NA#>19 1 NA NA NA NA NA NA#>20 7 NA NA NA NA NA NA#>21 7 0 -2 不适用 不适用 不适用#>22 7 0 -1 不适用 不适用 不适用#>23 7 1 1 -1 不适用 不适用 不适用#>24 7 0 不适用 1 -2 不适用 不适用#>25 7 0 不适用 2 -1 不适用 不适用#>26 7 1 NA NA 1 -1 NA#>27 7 0 不适用 不适用 不适用 1 -1#>28 7 1 NA NA NA NA 1#>29 7 1 NA NA NA NA 2#>30 7 1 NA NA NA NA 3

解决方案

这确实是一个棘手的问题,我相信代码可以进一步改进.但是,我能够重现您的预期结果.请用您的生产数据尝试这种方法.如果可以,我稍后会添加说明.

library(data.table)tmp <- setDT(df)[, rn := .I][!is.na(TF)][, rl := rleid(TF), by = id][, c("up", "dn") := .(seq_len(.N), -rev(seq_len(.N))), by = .(id, rl)][]res <- tmp[tmp[, seq_len(max(rl) - 1L), by = .(id)], on = .(id), allow.cartesian = TRUE][rl == V1, PM := dn][rl == V1 + 1L, PM := 向上][, dcast(.SD, id + TF + rn ~ paste0(PM", V1), value.var = PM")][df, on = .(rn, id, TF)][, -rn"]资源

<块引用>

 id TF PM1 PM2 PM3 PM4 PM51:0 NA NA NA NA NA NA2: 0 0 -3 NA NA NA NA3:0 NA NA NA NA NA NA4: 0 0 -2 NA NA NA NA5: 0 0 -1 NA NA NA NA6: 0 1 1 -3 不适用 不适用 不适用7: 0 1 2 -2 不适用 不适用 不适用8: 0 1 3 -1 不适用 不适用 不适用9:0 NA NA NA NA NA NA10: 0 0 不适用 1 不适用 不适用 不适用11: 0 0 不适用 2 不适用 不适用 不适用12:1 NA NA NA NA NA NA13: 1 0 -3 NA NA NA NA14: 1 0 -2 NA NA NA NA15: 1 0 -1 NA NA NA NA16:1 1 1 NA NA NA NA17: 1 1 2 NA NA NA NA18: 1 1 3 NA NA NA NA19:1 NA NA NA NA NA NA20:7 NA NA NA NA NA NA21: 7 0 -2 NA NA NA NA22: 7 0 -1 NA NA NA NA23: 7 1 1 -1 NA NA NA24: 7 0 不适用 1 -2 不适用 不适用25: 7 0 不适用 2 -1 不适用 不适用26: 7 1 NA NA 1 -1 NA27: 7 0 不适用 不适用 不适用 1 -128: 7 1 NA NA NA NA 129:7 1 NA NA NA NA 230: 7 1 NA NA NA NA 3id TF PM1 PM2 PM3 PM4 PM5

# 验证结果是否相同相同(res,dfa)

<块引用>

[1] 真

如果每组更改超过 9 个 paste0("PM", V1) 应替换为 sprintf("PM%02d",V1)在调用 dcast() 以确保 PM 列正确排序.

说明

tmp <-# 强制到 data.table设置DT(df)[# 创建行 id 列(最终连接需要返回 NA 行), rn := .I][# 忽略 NA 行!is.na(TF)][# 每组中唯一值的连续数, rl := rleid(TF), by = id][# 为每个条纹创建升序和降序计数# 这样做一次是为了避免为每个 PM 重复创建计数#(轻微的性能提升), c("up", "dn") := .(seq_len(.N), -rev(seq_len(.N))), by = .(id, rl)]tmp[]

<块引用>

 id TF rn rl up dn1:0 0 2 1 1 -32:0 0 4 1 2 -23:0 0 5 1 3 -14:0 1 6 2 1 -35:0 1 7 2 2 -26: 0 1 8 2 3 -17: 0 0 10 3 1 -28: 0 0 11 3 2 -19: 1 0 13 1 1 -310: 1 0 14 1 2 -211: 1 0 15 1 3 -112: 1 1 16 2 1 -313: 1 1 17 2 2 -214: 1 1 18 2 3 -115: 7 0 21 1 1 -216: 7 0 22 1 2 -117: 7 1 23 2 1 -118: 7 0 24 3 1 -219: 7 0 25 3 2 -120: 7 1 26 4 1 -121: 7 0 27 5 1 -122: 7 1 28 6 1 -323: 7 1 29 6 2 -224: 7 1 30 6 3 -1id TF rn rl up dn

下一步,我们需要每个组内的变化计数V1

tmp[, seq_len(max(rl) - 1L), by = .(id)]

<块引用>

 id V11:0 12: 0 23:1 14:7 15:7 26:7 37: 7 48: 7 5

现在,我们创建一个笛卡尔连接";每组行的所有可能变化:

# 右连接每个组内的变化计数tmp[tmp[, seq_len(max(rl) - 1L), by = .(id)], on = .(id), allow.cartesian = TRUE][# 将降序计数复制到切换前的行rl == V1, PM := dn][# 将递增计数复制到切换后的行rl == V1 + 1L,下午 := 向上][]

<块引用>

 id TF rn rl up dn V1 PM1:0 0 2 1 1 -3 1 -32:0 0 4 1 2 -2 1 -23:0 0 5 1 3 -1 1 -14:0 1 6 2 1 -3 1 15:0 1 7 2 2 -2 1 26: 0 1 8 2 3 -1 1 37: 0 0 10 3 1 -2 1 不适用8: 0 0 11 3 2 -1 1 不适用9: 0 0 2 1 1 -3 2 不适用10: 0 0 4 1 2 -2 2 不适用11: 0 0 5 1 3 -1 2 不适用12: 0 1 6 2 1 -3 2 -313: 0 1 7 2 2 -2 2 -214: 0 1 8 2 3 -1 2 -115: 0 0 10 3 1 -2 2 116: 0 0 11 3 2 -1 2 217: 1 0 13 1 1 -3 1 -318: 1 0 14 1 2 -2 1 -219: 1 0 15 1 3 -1 1 -120: 1 1 16 2 1 -3 1 121: 1 1 17 2 2 -2 1 222: 1 1 18 2 3 -1 1 323: 7 0 21 1 1 -2 1 -224: 7 0 22 1 2 -1 1 -125: 7 1 23 2 1 -1 1 126: 7 0 24 3 1 -2 1 不适用27: 7 0 25 3 2 -1 1 不适用28: 7 1 26 4 1 -1 1 不适用29: 7 0 27 5 1 -1 1 不适用30: 7 1 28 6 1 -3 1 不适用31: 7 1 29 6 2 -2 1 不适用32: 7 1 30 6 3 -1 1 不适用33: 7 0 21 1 1 -2 2 不适用34: 7 0 22 1 2 -1 2 不适用35: 7 1 23 2 1 -1 2 -136: 7 0 24 3 1 -2 2 137: 7 0 25 3 2 -1 2 238: 7 1 26 4 1 -1 2 不适用39: 7 0 27 5 1 -1 2 不适用40: 7 1 28 6 1 -3 2 不适用41: 7 1 29 6 2 -2 2 不适用42: 7 1 30 6 3 -1 2 不适用43: 7 0 21 1 1 -2 3 不适用44: 7 0 22 1 2 -1 3 不适用45: 7 1 23 2 1 -1 3 不适用46: 7 0 24 3 1 -2 3 -247: 7 0 25 3 2 -1 3 -148: 7 1 26 4 1 -1 3 149: 7 0 27 5 1 -1 3 不适用50: 7 1 28 6 1 -3 3 不适用51: 7 1 29 6 2 -2 3 不适用52: 7 1 30 6 3 -1 3 不适用53: 7 0 21 1 1 -2 4 不适用54: 7 0 22 1 2 -1 4 不适用55: 7 1 23 2 1 -1 4 不适用56: 7 0 24 3 1 -2 4 不适用57: 7 0 25 3 2 -1 4 不适用58: 7 1 26 4 1 -1 4 -159: 7 0 27 5 1 -1 4 160: 7 1 28 6 1 -3 4 不适用61: 7 1 29 6 2 -2 4 不适用62: 7 1 30 6 3 -1 4 不适用63: 7 0 21 1 1 -2 5 不适用64: 7 0 22 1 2 -1 5 不适用65: 7 1 23 2 1 -1 5 不适用66: 7 0 24 3 1 -2 5 不适用67: 7 0 25 3 2 -1 5 不适用68: 7 1 26 4 1 -1 5 不适用69: 7 0 27 5 1 -1 5 -170: 7 1 28 6 1 -3 5 171: 7 1 29 6 2 -2 5 272: 7 1 30 6 3 -1 5 3id TF rn rl up dn V1 PM

最后,中间结果从长格式改成宽格式.

res <-# 创建一个笛卡尔连接";每组行的所有可能变化tmp[tmp[, seq_len(max(rl) - 1L), by = .(id)], on = .(id), allow.cartesian = TRUE][# 将降序计数复制到切换前的行rl == V1, PM := dn][# 将递增计数复制到切换后的行rl == V1 + 1L,下午 := 向上][# 将更改计数从宽重新调整为新列, dcast(.SD, id + TF + rn ~ sprintf("PM%02d", V1), value.var = "PM")][# 加入原始 df 以获得 NA 行df, on = .(rn, id, TF)][# 省略辅助列, -rn"]

I am working to count occurrences of unique values within my groups, id. I'm looking at TF. When TF changes I want to count both forward and backwards from that point. This counting should be stored in a new variable PM#, so that PM# holds both plus and minus to each unique shift in TF. From what I've gathered I need to use rle, but I am kinda stuck.

I made this working example to illustrate my issue.

I have this data

df <- structure(list(id = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L), TF = c(NA, 0L, NA, 0L, 0L, 1L, 1L, 1L, NA, 0L, 
0L, NA, 0L, 0L, 0L, 1L, 1L, 1L, NA, NA, 0L, 0L, 1L, 0L, 0L, 1L, 
0L, 1L, 1L, 1L)), .Names = c("id", "TF"), class = "data.frame", row.names = c(NA, 
-30L))

This is the kinda data I am seeing

df[c(1:12,19:30),]
#>    id TF
#> 1   0 NA
#> 2   0  0
#> 3   0 NA
#> 4   0  0
#> 5   0  0
#> 6   0  1
#> 7   0  1
#> 8   0  1
#> 9   0 NA
#> 10  0  0
#> 11  0  0
#> 12  1 NA
#> 19  1 NA
#> 20  7 NA
#> 21  7  0
#> 22  7  0
#> 23  7  1
#> 24  7  0
#> 25  7  0
#> 26  7  1
#> 27  7  0
#> 28  7  1
#> 29  7  1
#> 30  7  1

I've started meddling with ave, cumsum and with rle, but haven't solved it this way yet.

df$PM01 <- with(df, ifelse(is.na(TF), NA, 1))
df$PM01 <- with(df, ave(PM01, TF, id, FUN=cumsum))

with(df, tapply(TF, rep(rle(id)[[2]], rle(id)[[1]]), count))

This is what I am trying to obtain,

dfa <- structure(list(id = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L), TF = c(NA, 0L, NA, 0L, 0L, 1L, 1L, 1L, NA, 0L, 
0L, NA, 0L, 0L, 0L, 1L, 1L, 1L, NA, NA, 0L, 0L, 1L, 0L, 0L, 1L, 
0L, 1L, 1L, 1L), PM1 = c(NA, -3L, NA, -2L, -1L, 1L, 2L, 3L, NA, 
NA, NA, NA, -3L, -2L, -1L, 1L, 2L, 3L, NA, NA, -2L, -1L, 1L, 
NA, NA, NA, NA, NA, NA, NA), PM2 = c(NA, NA, NA, NA, NA, -3L, 
-2L, -1L, NA, 1L, 2L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, -1L, 1L, 2L, NA, NA, NA, NA, NA), PM3 = c(NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, -2L, -1L, 1L, NA, NA, NA, NA), PM4 = c(NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, -1L, 1L, NA, NA, NA), PM5 = c(NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, -1L, 1L, 2L, 3L)), .Names = c("id", 
"TF", "PM1", "PM2", "PM3", "PM4", "PM5"), class = "data.frame", row.names = c(NA, 
-30L))

dfa[c(1:12,19:30),]
#>    id TF PM1 PM2 PM3 PM4 PM5
#> 1   0 NA  NA  NA  NA  NA  NA
#> 2   0  0  -3  NA  NA  NA  NA
#> 3   0 NA  NA  NA  NA  NA  NA
#> 4   0  0  -2  NA  NA  NA  NA
#> 5   0  0  -1  NA  NA  NA  NA
#> 6   0  1   1  -3  NA  NA  NA
#> 7   0  1   2  -2  NA  NA  NA
#> 8   0  1   3  -1  NA  NA  NA
#> 9   0 NA  NA  NA  NA  NA  NA
#> 10  0  0  NA   1  NA  NA  NA
#> 11  0  0  NA   2  NA  NA  NA
#> 12  1 NA  NA  NA  NA  NA  NA
#> 19  1 NA  NA  NA  NA  NA  NA
#> 20  7 NA  NA  NA  NA  NA  NA
#> 21  7  0  -2  NA  NA  NA  NA
#> 22  7  0  -1  NA  NA  NA  NA
#> 23  7  1   1  -1  NA  NA  NA
#> 24  7  0  NA   1  -2  NA  NA
#> 25  7  0  NA   2  -1  NA  NA
#> 26  7  1  NA  NA   1  -1  NA
#> 27  7  0  NA  NA  NA   1  -1
#> 28  7  1  NA  NA  NA  NA   1
#> 29  7  1  NA  NA  NA  NA   2
#> 30  7  1  NA  NA  NA  NA   3

解决方案

This was really a tricky one, and I'm sure the code can be further improved. However, I was able to reproduce your expected result. Please, try this approach with your production data. If OK, I will add an explanation later.

library(data.table)

tmp <- setDT(df)[, rn := .I][!is.na(TF)][, rl := rleid(TF), by = id][
  , c("up", "dn") := .(seq_len(.N), -rev(seq_len(.N))), by = .(id, rl)][]

res <- tmp[tmp[, seq_len(max(rl) - 1L), by = .(id)], on = .(id), allow.cartesian = TRUE][
  rl == V1, PM := dn][rl == V1 + 1L, PM := up][
    , dcast(.SD, id + TF + rn ~ paste0("PM", V1), value.var = "PM")][
      df, on = .(rn, id, TF)][, -"rn"]
res

    id TF PM1 PM2 PM3 PM4 PM5
 1:  0 NA  NA  NA  NA  NA  NA
 2:  0  0  -3  NA  NA  NA  NA
 3:  0 NA  NA  NA  NA  NA  NA
 4:  0  0  -2  NA  NA  NA  NA
 5:  0  0  -1  NA  NA  NA  NA
 6:  0  1   1  -3  NA  NA  NA
 7:  0  1   2  -2  NA  NA  NA
 8:  0  1   3  -1  NA  NA  NA
 9:  0 NA  NA  NA  NA  NA  NA
10:  0  0  NA   1  NA  NA  NA
11:  0  0  NA   2  NA  NA  NA
12:  1 NA  NA  NA  NA  NA  NA
13:  1  0  -3  NA  NA  NA  NA
14:  1  0  -2  NA  NA  NA  NA
15:  1  0  -1  NA  NA  NA  NA
16:  1  1   1  NA  NA  NA  NA
17:  1  1   2  NA  NA  NA  NA
18:  1  1   3  NA  NA  NA  NA
19:  1 NA  NA  NA  NA  NA  NA
20:  7 NA  NA  NA  NA  NA  NA
21:  7  0  -2  NA  NA  NA  NA
22:  7  0  -1  NA  NA  NA  NA
23:  7  1   1  -1  NA  NA  NA
24:  7  0  NA   1  -2  NA  NA
25:  7  0  NA   2  -1  NA  NA
26:  7  1  NA  NA   1  -1  NA
27:  7  0  NA  NA  NA   1  -1
28:  7  1  NA  NA  NA  NA   1
29:  7  1  NA  NA  NA  NA   2
30:  7  1  NA  NA  NA  NA   3
    id TF PM1 PM2 PM3 PM4 PM5

# verify results are identical
identical(res, dfa)

[1] TRUE

In case of more than 9 changes per group paste0("PM", V1) should be replaced by sprintf("PM%02d",V1) in the call to dcast() to ensure the PM columns are ordered properly.

Explanation

tmp <- 
  # coerce to data.table
  setDT(df)[
    # create row id column (required for final join to get NA rows back in)
    , rn := .I][
      # ignore NA rows 
      !is.na(TF)][
        # number streaks of unique values within each group
        , rl := rleid(TF), by = id][
          # create ascending and descending counts for each streak
          # this is done once to avoid repeatedly creation of counts for each PM 
          # (slight performance gain)
          , c("up", "dn") := .(seq_len(.N), -rev(seq_len(.N))), by = .(id, rl)]


tmp[]

    id TF rn rl up dn
 1:  0  0  2  1  1 -3
 2:  0  0  4  1  2 -2
 3:  0  0  5  1  3 -1
 4:  0  1  6  2  1 -3
 5:  0  1  7  2  2 -2
 6:  0  1  8  2  3 -1
 7:  0  0 10  3  1 -2
 8:  0  0 11  3  2 -1
 9:  1  0 13  1  1 -3
10:  1  0 14  1  2 -2
11:  1  0 15  1  3 -1
12:  1  1 16  2  1 -3
13:  1  1 17  2  2 -2
14:  1  1 18  2  3 -1
15:  7  0 21  1  1 -2
16:  7  0 22  1  2 -1
17:  7  1 23  2  1 -1
18:  7  0 24  3  1 -2
19:  7  0 25  3  2 -1
20:  7  1 26  4  1 -1
21:  7  0 27  5  1 -1
22:  7  1 28  6  1 -3
23:  7  1 29  6  2 -2
24:  7  1 30  6  3 -1
    id TF rn rl up dn

For the next step, we need the count of changes V1 within each group

tmp[, seq_len(max(rl) - 1L), by = .(id)]

   id V1
1:  0  1
2:  0  2
3:  1  1
4:  7  1
5:  7  2
6:  7  3
7:  7  4
8:  7  5

Now, we create a "cartesian join" of all possible changes with the rows of each group:

# right join with count of changes within each group
tmp[tmp[, seq_len(max(rl) - 1L), by = .(id)], on = .(id), allow.cartesian = TRUE][
  # copy descending counts to rows before the switch
  rl == V1, PM := dn][
    # copy ascending counts to rows after the switch
    rl == V1 + 1L, PM := up][]

    id TF rn rl up dn V1 PM
 1:  0  0  2  1  1 -3  1 -3
 2:  0  0  4  1  2 -2  1 -2
 3:  0  0  5  1  3 -1  1 -1
 4:  0  1  6  2  1 -3  1  1
 5:  0  1  7  2  2 -2  1  2
 6:  0  1  8  2  3 -1  1  3
 7:  0  0 10  3  1 -2  1 NA
 8:  0  0 11  3  2 -1  1 NA
 9:  0  0  2  1  1 -3  2 NA
10:  0  0  4  1  2 -2  2 NA
11:  0  0  5  1  3 -1  2 NA
12:  0  1  6  2  1 -3  2 -3
13:  0  1  7  2  2 -2  2 -2
14:  0  1  8  2  3 -1  2 -1
15:  0  0 10  3  1 -2  2  1
16:  0  0 11  3  2 -1  2  2
17:  1  0 13  1  1 -3  1 -3
18:  1  0 14  1  2 -2  1 -2
19:  1  0 15  1  3 -1  1 -1
20:  1  1 16  2  1 -3  1  1
21:  1  1 17  2  2 -2  1  2
22:  1  1 18  2  3 -1  1  3
23:  7  0 21  1  1 -2  1 -2
24:  7  0 22  1  2 -1  1 -1
25:  7  1 23  2  1 -1  1  1
26:  7  0 24  3  1 -2  1 NA
27:  7  0 25  3  2 -1  1 NA
28:  7  1 26  4  1 -1  1 NA
29:  7  0 27  5  1 -1  1 NA
30:  7  1 28  6  1 -3  1 NA
31:  7  1 29  6  2 -2  1 NA
32:  7  1 30  6  3 -1  1 NA
33:  7  0 21  1  1 -2  2 NA
34:  7  0 22  1  2 -1  2 NA
35:  7  1 23  2  1 -1  2 -1
36:  7  0 24  3  1 -2  2  1
37:  7  0 25  3  2 -1  2  2
38:  7  1 26  4  1 -1  2 NA
39:  7  0 27  5  1 -1  2 NA
40:  7  1 28  6  1 -3  2 NA
41:  7  1 29  6  2 -2  2 NA
42:  7  1 30  6  3 -1  2 NA
43:  7  0 21  1  1 -2  3 NA
44:  7  0 22  1  2 -1  3 NA
45:  7  1 23  2  1 -1  3 NA
46:  7  0 24  3  1 -2  3 -2
47:  7  0 25  3  2 -1  3 -1
48:  7  1 26  4  1 -1  3  1
49:  7  0 27  5  1 -1  3 NA
50:  7  1 28  6  1 -3  3 NA
51:  7  1 29  6  2 -2  3 NA
52:  7  1 30  6  3 -1  3 NA
53:  7  0 21  1  1 -2  4 NA
54:  7  0 22  1  2 -1  4 NA
55:  7  1 23  2  1 -1  4 NA
56:  7  0 24  3  1 -2  4 NA
57:  7  0 25  3  2 -1  4 NA
58:  7  1 26  4  1 -1  4 -1
59:  7  0 27  5  1 -1  4  1
60:  7  1 28  6  1 -3  4 NA
61:  7  1 29  6  2 -2  4 NA
62:  7  1 30  6  3 -1  4 NA
63:  7  0 21  1  1 -2  5 NA
64:  7  0 22  1  2 -1  5 NA
65:  7  1 23  2  1 -1  5 NA
66:  7  0 24  3  1 -2  5 NA
67:  7  0 25  3  2 -1  5 NA
68:  7  1 26  4  1 -1  5 NA
69:  7  0 27  5  1 -1  5 -1
70:  7  1 28  6  1 -3  5  1
71:  7  1 29  6  2 -2  5  2
72:  7  1 30  6  3 -1  5  3
    id TF rn rl up dn V1 PM

Finally, the intermediate result is reshaped from long to wide format.

res <- 
  # create a "cartesian join" of all possible changes with the rows of each group
  tmp[tmp[, seq_len(max(rl) - 1L), by = .(id)], on = .(id), allow.cartesian = TRUE][
    # copy descending counts to rows before the switch
    rl == V1, PM := dn][
      # copy ascending counts to rows after the switch
      rl == V1 + 1L, PM := up][
        # reshape from wide to long with the change count as new columns
        , dcast(.SD, id + TF + rn ~ sprintf("PM%02d", V1), value.var = "PM")][
          # join with original df to get NA rows back in
          df, on = .(rn, id, TF)][
            # omit helper column
            , -"rn"]

这篇关于在值变化前后计数,在组内,为每个独特的转变生成新变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆