在R中找到每个组的75%并用中位数代替 [英] find 75 percentile and replacing by median for each group in R
问题描述
这些问题与此我自己的话题相似 计算90%并用中位数按R
These problem similar with this my own topic calculation of 90 percentile and replacement of it by median by groups in R
具有这种区别.
但是,在该主题中 请注意,计算是通过一类动作之前的14个零完成的,但对于所有零类动作并针对每个组的code + item
But, in that topic Note the calculation is done by 14 zeros preceding the one category of action but replacing by median is done for all zero category of action and performing for each groups code+item
即,现在我使用全零而不是14,并且不会碰到负数和零值.
namely ,now i use all zeros and not 14 preceding and don't touch negative and zero values of return.
通过Zero
类别的组变量(动作-0、1),我希望通过返回变量找到75%,如果值大于75%,则必须在中位数上用zero
类别替换它.因此,存在code
变量.此过程必须对代码单独执行.注意:我不会触及负值和零值
By group variable (action- 0, 1) for Zero
category, i want find 75 percentile by return variable and if value is more than 75 percentile, it must be replaced on median by zero
category. So there is code
variable This procedure must be performed for code separately. Note: negative and zero value i don't touch
mydat=structure(list(code = c(123L, 123L, 123L, 123L, 123L, 123L, 123L,
123L, 123L, 123L, 123L, 123L, 124L, 124L, 124L, 124L, 124L, 124L,
124L, 124L, 124L, 124L, 124L, 124L), action = c(0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 1L, 1L, 1L), return = c(-1L, 0L, 23L, 100L, 18L, 15L, -1L,
0L, 23L, 100L, 18L, 15L, -1L, 0L, 23L, 100L, 18L, 15L, -1L, 0L,
23L, 100L, 18L, 15L)), .Names = c("code", "action", "return"), class = "data.frame", row.names = c(NA,
-24L))
\
23
100
18
15
如何获取该输出. 75%:
How to do it to get that output. so 75 percentile:
42,25 中位数= 20,5个替代
42,25 The median=20,5 replacement
add action return
123 0 -1
123 0 0
123 0 23
123 0 ***20,5
123 0 18
123 0 15
123 1 -1
123 1 0
123 1 23
123 1 100
123 1 18
123 1 15
124 0 -1
124 0 0
124 0 23
124 0 ***20,5
124 0 18
124 0 15
124 1 -1
124 1 0
124 1 23
124 1 100
124 1 18
124 1 15
使用最大的Uwe解决方案,我得到了错误
Using the greatest Uwe solution, i get the error
Error in `[.data.table`(mydat[action == 0, `:=`(output, as.double(return))], :
Column(s) [action] not found in i
如何处理我没有碰到的负值和零值,以及为什么会发生此错误.
library(data.table)
# mark the zero acton rows before the the action period
setDT(mydat)[, zero_before := cummax(action), by = .(code)]
# compute median and 90% quantile for that last 14 rows before each action period
agg <- mydat[zero_before == 0,
quantile(tail(return), c(0.5, 0.75)) %>%
as.list() %>%
set_names(c("med", "q90")) %>%
c(.(zero_before = 0)), by = .(code)]
agg
# append output column
mydat[action == 0, output := as.double(return)][
# replace output values greater q90 in an update non-equi join
agg, on = .(code,action, return > q90), output := as.double(med)][
# remove helper column
, zero_before := NULL]
推荐答案
如果我理解正确,那么OP希望根据所有零操作行(其中收益更大)计算每个组内return
的中位数和75%的分位数0.然后,如果零操作行中的任何返回值超过相应组的75%的分位数,则将其替换为组中位数.
If I understand correctly, the OP wants to compute median and 75% quantile of return
within each group based on all zero action rows where the return is greater 0. Then, any return value in a zero action row which exceeds the 75% quantile of the respective group is to be replaced by the group median.
代码可以大大简化,因为我们不必在动作行之前和之后的零动作行之间进行区分.
The code can be largely simplified as we do not have to distinghuish between zero action rows before and after the action rows.
下面的代码再现了预期的结果:
The code below reproduces the expected result:
library(data.table)
library(magrittr)
# compute median and 90% quantile for that last 14 rows before each action period
agg <- setDT(mydat)[action == 0 & return > 0,
quantile(return, c(0.5, 0.75)) %>%
as.list() %>%
set_names(c("med", "q75")), by = .(code, action)]
# append output column
mydat[, output := as.double(return)][
# replace output values greater q75 in an update non-equi join
agg, on = .(code, action, return > q75), output := as.double(med)]
mydat[]
code action return output
1: 123 0 -1 -1.0
2: 123 0 0 0.0
3: 123 0 23 23.0
4: 123 0 100 20.5
5: 123 0 18 18.0
6: 123 0 15 15.0
7: 123 1 -1 -1.0
8: 123 1 0 0.0
9: 123 1 23 23.0
10: 123 1 100 100.0
11: 123 1 18 18.0
12: 123 1 15 15.0
13: 124 0 -1 -1.0
14: 124 0 0 0.0
15: 124 0 23 23.0
16: 124 0 100 20.5
17: 124 0 18 18.0
18: 124 0 15 15.0
19: 124 1 -1 -1.0
20: 124 1 0 0.0
21: 124 1 23 23.0
22: 124 1 100 100.0
23: 124 1 18 18.0
24: 124 1 15 15.0
code action return output
这篇关于在R中找到每个组的75%并用中位数代替的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!