根据其在R中的场景选择组 [英] selecting groups by its scenario in R

查看:56
本文介绍了根据其在R中的场景选择组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此处的数据

mydat=structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "52382МСК", class = "factor"), 
    item = c(11709L, 11709L, 11709L, 11709L, 11708L, 11708L, 
    11708L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
    11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
    11710L, 11710L, 11710L, 11710L, 11710L, 11710L), sales = c(30L, 
    10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 
    3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 
    2L, 10L, 3L), action = c(0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 
    0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 0L, 0L, 0L)), .Names = c("code", "item", "sales", 
"action"), class = "data.frame", row.names = c(NA, -28L))

我有3组vars代码+项目。这里有3组:

I have 3 groups vars code+item. Here 3 groups:

code    item
52382МСК    11709
52382МСК    11708
52382МСК    11710

我也有操作列。它只能有两个值零(0)或一(1)。

Also i have action column. It can have only two values zero(0) or one(1).

每个组代表3个场景

52382МСК    11709

这是我们有1个零类别的情况行动学院在动作col的第一类别之前,以及在动作col的第一类别之后的两个零。
注意:当我们有2个零类别的动作col时,可能是这种情况。

it is scenario when we have 1 zero category of action col. before first category of action col , and two zeros after first category of action col. Note: maybe scenario when we have 2 zero category of action col. before first category of action col , and 1 zero after first category of action col.

52382МСК    11708

在这种情况下,我们有1个零类别的动作col。

it is scenario when we have 1 zero category of action col. and 1 zeros after first category of action col.

52382МСК    11710

在这种情况下,我们有3个(或更多)零类别的动作列。

it is scenario when we have 3(or more) zero category of action col. and 3(or more) zeros after first category of action col.

我该如何选择具有每种情况的组?
Ie Mydat1 是第一种情况的组,
Mydat2 是第二种情况的组,
Mydat3 这是具有第三种情况的组

How can i select groups which have each of scenario? I.e Mydat1 it is groups with first scenario, Mydat2 it is groups with second scenario, and Mydat3 it is groups with third scenario

输出简单

mydat1

code    item    sales   action
52382МСК    11709   30  0
52382МСК    11709   10  1
52382МСК    11709   20  0
52382МСК    11709   15  0



mydat2
code    item    sales   action
52382МСК    11708   2   0
52382МСК    11708   10  1
52382МСК    11708   3   0

mydat3
code    item    sales   action
52382МСК    11710   30  0
52382МСК    11710   10  0
52382МСК    11710   20  0
52382МСК    11710   15  1
52382МСК    11710   2   0
52382МСК    11710   10  0
52382МСК    11710   3   0
52382МСК    11710   30  0
52382МСК    11710   10  0
52382МСК    11710   20  0
52382МСК    11710   15  1
52382МСК    11710   2   0
52382МСК    11710   10  0
52382МСК    11710   3   0
52382МСК    11710   30  0
52382МСК    11710   10  0
52382МСК    11710   20  0
52382МСК    11710   15  1
52382МСК    11710   2   0
52382МСК    11710   10  0
52382МСК    11710   3   0



编辑



我忘了,当我们有1个零类别的动作col时,可能就是这种情况。在动作col的第一类别之前,以及在动作col的第一类别之后的三个零。
或当我们有3个零类别的动作列时的场景。

Edit

I forgot, it can be scenario when we have 1 zero category of action col. before first category of action col , and three zeros after first category of action col. or maybe scenario when we have 3 zero category of action col. before first category of action col , and 1 zero after first category of action col.

(mydat4)

当我们有2个零类别的动作col时,也可能是
场景。在动作col的第一类别之前,以及在动作col的第一类别之后的三个零。
或当我们有3个零类别的动作列时的场景。

Also can be scenario when we have 2 zero category of action col. before first category of action col , and three zeros after first category of action col. or maybe scenario when we have 3 zero category of action col. before first category of action col , and 2 zero after first category of action col.

(mydat5)

IE

我发现下一组,它只有一行

i found next group, it has only one row

code    item    sales   action
52499МСК    11202   2   0

如果数据只有一行,那将是6种情况?

how to do that if data has only one row it would be 6 scenario?

code    item    sales   action
52499МСК    11202   2   0
 52499МСК   11202   2   1

code    item    sales   action
52499МСК    11202   2   0
 52499МСК   11202   2   0

code    item    sales   action
52499МСК    11202   2   1
 52499МСК   11202   2   1

如果组中只有两行,则有7种情况

if we have only two rows in group, then 7 scenario

推荐答案

如果我对您的理解正确,那么情况如下:

If I understand you correctly, then the scenarios are as follows:

s1:010 0

s2:010

s3:000 ... 1000 ...

s4:01000或00010

s5:001000或000100

s6:0或1

s7:01或00或10或11

s1: 0100
s2: 010
s3: 000...1000...
s4: 01000 or 00010
s5: 001000 or 000100
s6: 0 or 1
s7: 01 or 00 or 10 or 11

这些方案中的每一个都有唯一的行数。 s1是4行,s2是3,s3是7 +,s4是5,s5是6,s6是1,s7是2。

If so, then conveniently each of these scenarios has a unique count of rows. s1 is 4 rows, s2 is 3, s3 is 7+, s4 is 5, s5 is 6, s6 is 1, and s7 is 2.

我们可以执行以下操作:

Using that fact, we can do the following:

library(dplyr)

mydat = structure(list(code = c("52382MCK", "52382MCK", "52382MCK", "52382MCK", 
"52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", 
"52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", 
"52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", 
"52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK"
), item = c(11709L, 11709L, 11709L, 11709L, 11708L, 11708L, 11708L, 
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 
11710L, 11710L, 11710L, 11710L, 11710L), sales = c(30L, 10L, 
20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 
10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L), 
    action = c(0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 
    0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
    0L)), class = "data.frame", row.names = c(NA, -28L), .Names = c("code", 
"item", "sales", "action"))

mydat = mydat %>%
  group_by(code, item) %>%
  mutate(groups_item_count = n(),
         scenario = case_when(groups_item_count == 4 ~ 1,
                              groups_item_count == 3 ~ 2,
                              groups_item_count >= 7 ~ 3,
                              groups_item_count == 5 ~ 4,
                              groups_item_count == 6 ~ 5,
                              groups_item_count == 1 ~ 6,
                              groups_item_count == 2 ~ 7))

这将添加一个 scenario列,用于指示是1,2,3,4还是5。我强烈建议您不要不会将您的数据框架分解为多个数据框架。我愿意猜测dplyr的 filter() group_by()函数将更有效地完成所有任务你想下一步完成吗?但是,如果您坚持要针对每种情况将其分解为单独的数据帧,则可以执行以下操作:

This will add a "scenario" column that will indicate whether it's scenario 1,2,3,4, or 5. I would strongly suggest you don't break up your data frame into multiple data frames. I would be willing to guess that dplyr's filter() and group_by() functions would be more efficient for accomplishing whatever it is you want to accomplish next. However, if you're adamant on breaking up into separate data frames for each scenario, then you could do:

mydat1 = filter(mydat, scenario == 1)
mydat2 = filter(mydat, scenario == 2)
mydat3 = filter(mydat, scenario == 3)
mydat4 = filter(mydat, scenario == 4)
mydat5 = filter(mydat, scenario == 5)
mydat6 = filter(mydat, scenario == 6)
mydat7 = filter(mydat, scenario == 7)

这篇关于根据其在R中的场景选择组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆