在R中找到每个组的75%并用中位数代替 [英] find 75 percentile and replacing by median for each group in R

查看:164
本文介绍了在R中找到每个组的75%并用中位数代替的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这些问题与此我自己的话题相似 计算90%并用中位数按R

These problem similar with this my own topic calculation of 90 percentile and replacement of it by median by groups in R

具有这种区别.

但是,在该主题中 请注意,计算是通过一类动作之前的14个零完成的,但对于所有零类动作并针对每个组的code + item

But, in that topic Note the calculation is done by 14 zeros preceding the one category of action but replacing by median is done for all zero category of action and performing for each groups code+item

即,现在我使用全零而不是14,并且不会碰到负数和零值.

namely ,now i use all zeros and not 14 preceding and don't touch negative and zero values of return.

通过Zero类别的组变量(动作-0、1),我希望通过返回变量找到75%,如果值大于75%,则必须在中位数上用zero类别替换它.因此,存在code变量.此过程必须对代码单独执行.注意:我不会触及负值和零值

By group variable (action- 0, 1) for Zero category, i want find 75 percentile by return variable and if value is more than 75 percentile, it must be replaced on median by zero category. So there is code variable This procedure must be performed for code separately. Note: negative and zero value i don't touch

mydat=structure(list(code = c(123L, 123L, 123L, 123L, 123L, 123L, 123L, 
123L, 123L, 123L, 123L, 123L, 124L, 124L, 124L, 124L, 124L, 124L, 
124L, 124L, 124L, 124L, 124L, 124L), action = c(0L, 0L, 0L, 0L, 
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 
1L, 1L, 1L, 1L), return = c(-1L, 0L, 23L, 100L, 18L, 15L, -1L, 
0L, 23L, 100L, 18L, 15L, -1L, 0L, 23L, 100L, 18L, 15L, -1L, 0L, 
23L, 100L, 18L, 15L)), .Names = c("code", "action", "return"), class = "data.frame", row.names = c(NA, 
-24L))

\

23
100
18
15

如何获取该输出. 75%:

How to do it to get that output. so 75 percentile:

42,25 中位数= 20,5个替代

42,25 The median=20,5 replacement

 add  action   return
123   0    -1
123   0    0
123   0    23
123   0    ***20,5
123   0    18
123   0    15
123   1  -1
123   1  0
123   1  23
123   1  100
123   1  18
123   1  15
124   0    -1
124   0    0
124   0    23
124   0    ***20,5
124   0    18
124   0    15
124   1  -1
124   1  0
124   1  23
124   1  100
124   1  18
124   1  15

使用最大的Uwe解决方案,我得到了错误

Using the greatest Uwe solution, i get the error

Error in `[.data.table`(mydat[action == 0, `:=`(output, as.double(return))],  : 
  Column(s) [action] not found in i

如何处理我没有碰到的负值和零值,以及为什么会发生此错误.

library(data.table)
# mark the zero acton rows before the the action period
setDT(mydat)[, zero_before := cummax(action), by = .(code)]
# compute median and 90% quantile for that last 14 rows before each action period 
agg <- mydat[zero_before == 0, 
             quantile(tail(return), c(0.5, 0.75)) %>% 
               as.list()  %>% 
               set_names(c("med", "q90")) %>% 
               c(.(zero_before = 0)), by = .(code)]
agg


# append output column
mydat[action == 0, output := as.double(return)][
  # replace output values greater q90 in an update non-equi join
  agg, on = .(code,action, return > q90), output := as.double(med)][
    # remove helper column
    , zero_before := NULL]

推荐答案

如果我理解正确,那么OP希望根据所有零操作行(其中收益更大)计算每个组内return的中位数和75%的分位数0.然后,如果零操作行中的任何返回值超过相应组的75%的分位数,则将其替换为组中位数.

If I understand correctly, the OP wants to compute median and 75% quantile of return within each group based on all zero action rows where the return is greater 0. Then, any return value in a zero action row which exceeds the 75% quantile of the respective group is to be replaced by the group median.

代码可以大大简化,因为我们不必在动作行之前和之后的零动作行之间进行区分.

The code can be largely simplified as we do not have to distinghuish between zero action rows before and after the action rows.

下面的代码再现了预期的结果:

The code below reproduces the expected result:

library(data.table)
library(magrittr)
# compute median and 90% quantile for that last 14 rows before each action period 
agg <- setDT(mydat)[action == 0 & return > 0, 
                    quantile(return, c(0.5, 0.75)) %>% 
                      as.list()  %>% 
                      set_names(c("med", "q75")), by = .(code, action)]

# append output column
mydat[, output := as.double(return)][
  # replace output values greater q75 in an update non-equi join
  agg, on = .(code, action, return > q75), output := as.double(med)]
mydat[]

    code action return output
 1:  123      0     -1   -1.0
 2:  123      0      0    0.0
 3:  123      0     23   23.0
 4:  123      0    100   20.5
 5:  123      0     18   18.0
 6:  123      0     15   15.0
 7:  123      1     -1   -1.0
 8:  123      1      0    0.0
 9:  123      1     23   23.0
10:  123      1    100  100.0
11:  123      1     18   18.0
12:  123      1     15   15.0
13:  124      0     -1   -1.0
14:  124      0      0    0.0
15:  124      0     23   23.0
16:  124      0    100   20.5
17:  124      0     18   18.0
18:  124      0     15   15.0
19:  124      1     -1   -1.0
20:  124      1      0    0.0
21:  124      1     23   23.0
22:  124      1    100  100.0
23:  124      1     18   18.0
24:  124      1     15   15.0
    code action return output

这篇关于在R中找到每个组的75%并用中位数代替的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆