根据其在R中的场景选择组 [英] selecting groups by its scenario in R
问题描述
此处的数据
mydat=structure(list(code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "52382МСК", class = "factor"),
item = c(11709L, 11709L, 11709L, 11709L, 11708L, 11708L,
11708L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L,
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L,
11710L, 11710L, 11710L, 11710L, 11710L, 11710L), sales = c(30L,
10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L,
3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L,
2L, 10L, 3L), action = c(0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L)), .Names = c("code", "item", "sales",
"action"), class = "data.frame", row.names = c(NA, -28L))
我有3组vars代码+项目。这里有3组:
I have 3 groups vars code+item. Here 3 groups:
code item
52382МСК 11709
52382МСК 11708
52382МСК 11710
我也有操作列。它只能有两个值零(0)或一(1)。
Also i have action column. It can have only two values zero(0) or one(1).
每个组代表3个场景
52382МСК 11709
这是我们有1个零类别的情况行动学院在动作col的第一类别之前,以及在动作col的第一类别之后的两个零。
注意:当我们有2个零类别的动作col时,可能是这种情况。
it is scenario when we have 1 zero category of action col. before first category of action col , and two zeros after first category of action col. Note: maybe scenario when we have 2 zero category of action col. before first category of action col , and 1 zero after first category of action col.
52382МСК 11708
在这种情况下,我们有1个零类别的动作col。
it is scenario when we have 1 zero category of action col. and 1 zeros after first category of action col.
52382МСК 11710
在这种情况下,我们有3个(或更多)零类别的动作列。
it is scenario when we have 3(or more) zero category of action col. and 3(or more) zeros after first category of action col.
我该如何选择具有每种情况的组?
Ie Mydat1
是第一种情况的组,
Mydat2
是第二种情况的组,
和 Mydat3
这是具有第三种情况的组
How can i select groups which have each of scenario?
I.e Mydat1
it is groups with first scenario,
Mydat2
it is groups with second scenario,
and Mydat3
it is groups with third scenario
输出简单
mydat1
code item sales action
52382МСК 11709 30 0
52382МСК 11709 10 1
52382МСК 11709 20 0
52382МСК 11709 15 0
mydat2
code item sales action
52382МСК 11708 2 0
52382МСК 11708 10 1
52382МСК 11708 3 0
mydat3
code item sales action
52382МСК 11710 30 0
52382МСК 11710 10 0
52382МСК 11710 20 0
52382МСК 11710 15 1
52382МСК 11710 2 0
52382МСК 11710 10 0
52382МСК 11710 3 0
52382МСК 11710 30 0
52382МСК 11710 10 0
52382МСК 11710 20 0
52382МСК 11710 15 1
52382МСК 11710 2 0
52382МСК 11710 10 0
52382МСК 11710 3 0
52382МСК 11710 30 0
52382МСК 11710 10 0
52382МСК 11710 20 0
52382МСК 11710 15 1
52382МСК 11710 2 0
52382МСК 11710 10 0
52382МСК 11710 3 0
编辑
我忘了,当我们有1个零类别的动作col时,可能就是这种情况。在动作col的第一类别之前,以及在动作col的第一类别之后的三个零。
或当我们有3个零类别的动作列时的场景。
Edit
I forgot, it can be scenario when we have 1 zero category of action col. before first category of action col , and three zeros after first category of action col. or maybe scenario when we have 3 zero category of action col. before first category of action col , and 1 zero after first category of action col.
(mydat4)
当我们有2个零类别的动作col时,也可能是
场景。在动作col的第一类别之前,以及在动作col的第一类别之后的三个零。
或当我们有3个零类别的动作列时的场景。
Also can be scenario when we have 2 zero category of action col. before first category of action col , and three zeros after first category of action col. or maybe scenario when we have 3 zero category of action col. before first category of action col , and 2 zero after first category of action col.
(mydat5)
IE
我发现下一组,它只有一行
i found next group, it has only one row
code item sales action
52499МСК 11202 2 0
如果数据只有一行,那将是6种情况?
how to do that if data has only one row it would be 6 scenario?
code item sales action
52499МСК 11202 2 0
52499МСК 11202 2 1
或
code item sales action
52499МСК 11202 2 0
52499МСК 11202 2 0
或
code item sales action
52499МСК 11202 2 1
52499МСК 11202 2 1
如果组中只有两行,则有7种情况
if we have only two rows in group, then 7 scenario
推荐答案
如果我对您的理解正确,那么情况如下:
If I understand you correctly, then the scenarios are as follows:
s1:010 0
s2:010
s3:000 ... 1000 ...
s4:01000或00010
s5:001000或000100
s6:0或1
s7:01或00或10或11
s1: 0100
s2: 010
s3: 000...1000...
s4: 01000 or 00010
s5: 001000 or 000100
s6: 0 or 1
s7: 01 or 00 or 10 or 11
这些方案中的每一个都有唯一的行数。 s1是4行,s2是3,s3是7 +,s4是5,s5是6,s6是1,s7是2。
If so, then conveniently each of these scenarios has a unique count of rows. s1 is 4 rows, s2 is 3, s3 is 7+, s4 is 5, s5 is 6, s6 is 1, and s7 is 2.
我们可以执行以下操作:
Using that fact, we can do the following:
library(dplyr)
mydat = structure(list(code = c("52382MCK", "52382MCK", "52382MCK", "52382MCK",
"52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK",
"52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK",
"52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK",
"52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK", "52382MCK"
), item = c(11709L, 11709L, 11709L, 11709L, 11708L, 11708L, 11708L,
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L,
11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L, 11710L,
11710L, 11710L, 11710L, 11710L, 11710L), sales = c(30L, 10L,
20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L, 30L,
10L, 20L, 15L, 2L, 10L, 3L, 30L, 10L, 20L, 15L, 2L, 10L, 3L),
action = c(0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L)), class = "data.frame", row.names = c(NA, -28L), .Names = c("code",
"item", "sales", "action"))
mydat = mydat %>%
group_by(code, item) %>%
mutate(groups_item_count = n(),
scenario = case_when(groups_item_count == 4 ~ 1,
groups_item_count == 3 ~ 2,
groups_item_count >= 7 ~ 3,
groups_item_count == 5 ~ 4,
groups_item_count == 6 ~ 5,
groups_item_count == 1 ~ 6,
groups_item_count == 2 ~ 7))
这将添加一个 scenario列,用于指示是1,2,3,4还是5。我强烈建议您不要不会将您的数据框架分解为多个数据框架。我愿意猜测dplyr的 filter()
和 group_by()
函数将更有效地完成所有任务你想下一步完成吗?但是,如果您坚持要针对每种情况将其分解为单独的数据帧,则可以执行以下操作:
This will add a "scenario" column that will indicate whether it's scenario 1,2,3,4, or 5. I would strongly suggest you don't break up your data frame into multiple data frames. I would be willing to guess that dplyr's filter()
and group_by()
functions would be more efficient for accomplishing whatever it is you want to accomplish next. However, if you're adamant on breaking up into separate data frames for each scenario, then you could do:
mydat1 = filter(mydat, scenario == 1)
mydat2 = filter(mydat, scenario == 2)
mydat3 = filter(mydat, scenario == 3)
mydat4 = filter(mydat, scenario == 4)
mydat5 = filter(mydat, scenario == 5)
mydat6 = filter(mydat, scenario == 6)
mydat7 = filter(mydat, scenario == 7)
这篇关于根据其在R中的场景选择组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!