按组计算第二高的累计值 [英] Calculate second highest cumulative value by group

查看:40
本文介绍了按组计算第二高的累计值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据带有分组变量"grps"和值"x".我已经计算了每个"cmx"组中的 cummax .现在,我需要在每组 scmx 中找到第二个最高的'x'累积值.

I have data with a grouping variable 'grps' and a value 'x'. I have calculated the cummax within each group 'cmx'. Now I need to find the second highest cumulative value of 'x' within each group, scmx.

一些数据,包括所需的列 scmx :

Some data, including the desired column scmx:

library(data.table)
d = structure(list(date = structure(rep(c(18690, 18691, 18692, 18693, 18694, 18695, 18696, 18697), 2), class = "Date"),
                   x = c(18, 70, 57, 94, 94, 13, 98, 23, 20, 72, 59, 96, 96, 15, 100, 25),
                   grps = c(rep("g1", 8), rep("g2", 8))),
              row.names = c(NA, -16L), class = c("data.table", "data.frame"))
d[, cmx := cummax(x), by = .(grps)]
d[, scmx := c(18, 18, 57, 70, 70, 70, 94, 94, 20, 20, 59, 72, 72, 72, 96, 96)]


上下文

如果 x 与某个性能等级相对应,我想做的就是找到他们达到最佳表现和第二最佳表现的日期.我有一个类似的问题,我需要在其中定位与列中最高累积值相对应的行:


Context

If x corresponds to a performance rating, what I am trying to do is locate the date when they achieved their best performance and their second best. A similar question of mine where I needed to locate the row which corresponded to the highest cumulative value in a column:

在每个累积最大值内向下滚动第一行

推荐答案

一个 data.table 替代方案:

d[ , scmx2 := {
  c(x[1], sapply(seq(.N)[-1], function(i){
    v = x[1:i]
    v[frank(-v, ties.method = "dense") == 2][1]
  }))
}, by = grps]

#           date   x grps cmx scmx scmx2
#  1: 2021-03-04  18   g1  18   18    18
#  2: 2021-03-05  70   g1  70   18    18
#  3: 2021-03-06  57   g1  70   57    57
#  4: 2021-03-07  94   g1  94   70    70
#  5: 2021-03-08  94   g1  94   70    70
#  6: 2021-03-09  13   g1  94   70    70
#  7: 2021-03-10  98   g1  98   94    94
#  8: 2021-03-11  23   g1  98   94    94
#  9: 2021-03-04  20   g2  20   20    20
# 10: 2021-03-05  72   g2  72   20    20
# 11: 2021-03-06  59   g2  72   59    59
# 12: 2021-03-07  96   g2  96   72    72
# 13: 2021-03-08  96   g2  96   72    72
# 14: 2021-03-09  15   g2  96   72    72
# 15: 2021-03-10 100   g2 100   96    96
# 16: 2021-03-11  25   g2 100   96    96


在每个组中( by = grps ),在从2到当前组中的行数( seq(.N)[-1] ).在每个步骤中,从向量的开始到索引"i"的子集"x"( v = x [1:i] ).


Within each group (by = grps), loop (sapply) over a sequence from 2 to number of rows in the current group (seq(.N)[-1]). In each step, subset 'x' from start of the vector to the index 'i' (v = x[1:i]).

计算密集等级,并检查等级是否为2( frank(-v,ties.method ="dense")== 2 ),即第二大数字的等级.使用逻辑索引来子集"v"( v [... ).选择第一个匹配项( [1] ;对于具有第2位的多个值,请选择).将这个扩展窗口"的结果与"x"的第一个元素( c(x [1],... ))连接起来.

Calculate dense rank and check if the rank is 2 (frank(-v, ties.method = "dense") == 2), i.e. the rank of the second largest number. Use the logical indices to subset 'v' (v[...). Select the first match ([1]; in case of several values with rank 2). Concatenate the result from this 'expanding window' with the first element of 'x' (c(x[1], ...).

在第一个窗口中,只有一个值,显然没有第二个最高值.在这里,OP已选择返回第一个值.对于所有值都相等的较长窗口,也需要做出相同的选择,这将在存在相等值的前导行时发生.如果我们宁愿返回 NA 而不是第一个值,则替换行中的 x [1]

In the first window, with only one value, there is clearly no second highest value. Here OP have chosen to return the first value. The same choice needs to be made also for longer windows where all values are equal, which will occur when there are leading runs of equal values. If we rather want to return NA than the first value, then replace the x[1] in the line

c(x[1], sapply(seq(.N)[-1], function(i){

...带有 NA_real _ .

小型演示:

d = data.table(grps = c(1, 1, 2, 2, 2), x = c(3, 3, 4, 4, 5)) 

d[ , scmx2 := {
  c(NA_real_, sapply(seq(.N)[-1], function(i){
    v = x[1:i]
    v[frank(-v, ties.method = "dense") == 2][1]
  }))
}, by = grps]

#    grps x scmx
# 1:    1 3   NA # grp 1: all values equal in all windows -> all NA
# 2:    1 3   NA
# 3:    2 4   NA
# 4:    2 4   NA  
# 5:    2 5    4 # grp 2: only the last window has a second highest value  


这个问题确实类似于我上面链接的帖子(在R 中找到每个组的累积第二个最大值.但是,这里OP要求一个 data.table 解决方案.


This question is indeed similar to the post I linked to above (Finding cumulative second max per group in R). However, here OP asked for a data.table solution.

这篇关于按组计算第二高的累计值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆