用R中的组分隔的三个零之前和三个之后的条件替换类别的中位数 [英] replace median for category by condition of three zero before and three after separated by groups in R

查看:78
本文介绍了用R中的组分隔的三个零之前和三个之后的条件替换类别的中位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说,我有数据集

 mydat=structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "25481МСК", class = "factor"), 
    item = c(13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 
    13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 
    13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 13163L, 
    13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 
    13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 13164L, 
    13164L, 13164L, 13164L, 13164L, 13164L, 13164L), sales = c(4L, 
    1L, 10L, 6L, 8L, 3L, 11L, 6L, 4L, 2L, 4L, 2L, 4L, 3L, 10L, 
    4L, 15L, 10L, 6L, 6L, 5L, 4L, 4L, 1L, 10L, 6L, 8L, 3L, 11L, 
    6L, 4L, 2L, 4L, 2L, 4L, 3L, 10L, 4L, 15L, 10L, 6L, 6L, 5L, 
    4L), action = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 
    0L, 0L, 0L)), .Names = c("code", "item", "sales", "action"
), class = "data.frame", row.names = c(NA, -44L))

我有2组vars代码和项目.这是两组:

I have 2 groups vars code+item. Here two groups:

25481МСК    13163
25480МСК    13164

我也有行动专栏.它只能有两个值零(0)或一(1). 我需要按操作列按三个前面的零类别来计算中值,即按一列操作列来计算中位数,按操作列按三个零来计算该类别后的中位数.

Also i have action column. It can have only two values zero(0) or one(1). I need to calculate the median by three preceding zeros category by action column, i.e. which go before one category of action column, and by three zeros by action column that go after the one category.

这里有个例子

sales   action  output
2          0    2
4          0    4
3          0    3
10         1    **5**
4          1    **5**
15         1    **5**
10         0    10
6          0    6
6          0    6

median =(2,4,3),(10,6,6)= 5

所以在1之前和之后= 5的中位数按零分类, 然后以该中位数代替行动中的人(1).即这些零内的一个类别.因为从示例中可以看出,零内还有其他一个,因此必须对它们应用相同的原理. 但是,如果中位数大于销售额,则不要替换.

median=(2,4,3),(10,6,6)=5

so median by zeros category before one and after one =5, then replace ones(1) by action by this median. i.e. the one category that is inside these zeros. Because, as can be seen from the example, there are other ones inside zeros.The same principle must be applied to them. BUT, if median is more than the sales, then do not replace it.

I.E.应该吧

sales   action
10       1
5        1
14       1

并且零位的中位数为12,因此在这种情况下输出将为

and median by zero is 12, so in this case output would be

output
10
5
12

仅需替换14个,导致位数超过中位数.

only 14 must be replaced, cause it more then median.

在真实情况下

sales   action  output
2          0    2
4          0    4
3          0    3
10         1    **5**
4          1    **4**
15         1    **5**
10         0    10
6          0    6
6          0    6

应该对每个组分别进行.

25481МСК    13163
25480МСК    13164

所需的输出

 code        item sales action output
1  25481МСК 13163     4      0      4
2  25481МСК 13163     1      0      1
3  25481МСК 13163    10      0     10
4  25481МСК 13163     6      0      6
5  25481МСК 13163     8      0      8
6  25481МСК 13163     3      0      3
7  25481МСК 13163    11      0     11
8  25481МСК 13163     6      0      6
9  25481МСК 13163     4      0      4
10 25481МСК 13163     2      0      2
11 25481МСК 13163     4      0      4
12 25481МСК 13163     2      0      2
13 25481МСК 13163     4      0      4
14 25481МСК 13163     3      0      3
15 25481МСК 13163    10      1      5
16 25481МСК 13163     4      1      5
17 25481МСК 13163    15      1      5
18 25481МСК 13163    10      0     10
19 25481МСК 13163     6      0      6
20 25481МСК 13163     6      0      6
21 25481МСК 13163     5      0      5
22 25481МСК 13163     4      0      4
23 25481МСК 13164     4      0      4
24 25481МСК 13164     1      0      1
25 25481МСК 13164    10      0     10
26 25481МСК 13164     6      0      6
27 25481МСК 13164     8      0      8
28 25481МСК 13164     3      0      3
29 25481МСК 13164    11      0     11
30 25481МСК 13164     6      0      6
31 25481МСК 13164     4      0      4
32 25481МСК 13164     2      0      2
33 25481МСК 13164     4      0      4
34 25481МСК 13164     2      0      2
35 25481МСК 13164     4      0      4
36 25481МСК 13164     3      0      3
37 25481МСК 13164    10      1      5
38 25481МСК 13164     4      1      5
39 25481МСК 13164    15      1      5
40 25481МСК 13164    10      0     10
41 25481МСК 13164     6      0      6
42 25481МСК 13164     6      0      6
43 25481МСК 13164     5      0      5
44 25481МСК 13164     4      0      4

请注意,action = 0的sales列的值也应该在输出列中.效果如何?

Note that value of sales column for action=0 also should be in the output column. How perform it?

P.S.请不要注意,该产出中位数多于销售额.这只是测试.

P.S. Please, do not pay attention to that there are medians in this output that more then sales. It's just test.

code    item    sales   action  output
52382МСК    11709   1   0   1
52382МСК    11709   10  1   NA
52382МСК    11709   1   0   1
52382МСК    11709   3   0   3

推荐答案

我认为这接近解决方案? (说实话,我不确定我是否完全理解这个问题)

I think this gets near a solution? (to be honest, I'm not sure I fully understand the question)

library(dplyr)

replacements <- 
  data_frame(
    action1      = which(mydat$action == 1L),
    group        = rep(1:length(action1), each = 3, length.out = length(action1)),
    sales1       = mydat$sales[action1],
    sales_before = mydat$sales[action1 - 3L],
    sales_after  = mydat$sales[action1 + 3L]
  ) %>%
  group_by(group) %>%
  mutate(
    med   = median(c(sales_before, sales_after)),
    output = pmin(sales1, med)
  )

mydat$output <- mydat$sales
mydat$output[replacements$action1] <- replacements$output

mydat

这篇关于用R中的组分隔的三个零之前和三个之后的条件替换类别的中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆