每对的特殊分组号 [英] Special grouping number for each pairs

查看:103
本文介绍了每对的特殊分组号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里已经回答了部分问题 special-group每个数据组合的数量。在大多数情况下,我们在数据内部具有对和其他数据值。我们要实现的是,如果存在这些对,则对那些组进行编号,并将它们编号直到下一对。

There is already some part of the question answered here special-group-number-for-each-combination-of-data. In most cases we have pairs and other data values inside the data. What we want to achieve is that number those groups if those pairs exist and number them until the next pairs.

当我集中每对时,例如 c( bad, good)希望将它们分组并配对 c('Veni', vidi, Vici)分配唯一编号 666

As I concentrated each pairs such as c("bad","good") would like to group them and for pairs c('Veni',"vidi","Vici") assign unique number 666.

以下是示例数据

names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")

  df <- data.frame(names)

这是真实情况和一般情况下的预期输出

Here is the real and general case expected output

     names  Group
1      bad    1
2     good    1
3        1    1
4        2    1
5     good    2
6      bad    2
7      111    2
8      bad    3
9  J.James    3
10    good    4
11 J.James    4
12     333    4
13 J.James    5
14    good    5
15     761    5
16    Veni    666
17    vidi    666
18    Vici    666


推荐答案

这里有两种方法可以为给定的样本数据集重现OP的预期结果。

Here are two approaches which reproduce OP's expected result for the given sample dataset.`

两者的工作方式相同。首先,将跳过所有令人烦扰的行,即不包含有效名称的行,并以2组为单位对具有有效名称的行进行简单编号。其次,为具有免除名称的行指定特殊的组数。最后, NA 行通过进行最后一个观察来填充。

Both work in the same way. First, all "disturbing" rows, i.e., rows which do not contain "valid" names, are skipped and the rows with "valid" names are simply numbered in groups of 2. Second, the rows with exempt names are given the special group number. Finally, the NA rows are filled by carrying the last observation forward.

library(data.table)
names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")
exempt <- c("Veni", "vidi", "Vici")
data.table(names)[is.na(as.numeric(names)) & !names %in% exempt, 
                  grp := rep(1:.N, each = 2L, length.out = .N)][
                    names %in% exempt, grp := 666L][
                      , grp := zoo::na.locf(grp)][]




      names grp
 1:     bad   1
 2:    good   1
 3:       1   1
 4:       2   1
 5:    good   2
 6:     bad   2
 7:     111   2
 8:     bad   3
 9: J.James   3
10:    good   4
11: J.James   4
12:     333   4
13: J.James   5
14:    good   5
15:     761   5
16:    Veni 666
17:    vidi 666
18:    Vici 666




dplyr / tidyr



我尝试提供 dplyr / tidyr 解决方案:

dplyr/tidyr

Here is my attempt to provide a dplyr/tidyr solution:

library(dplyr)
as_tibble(names) %>% 
  mutate(grp = if_else(is.na(as.numeric(names)) & !names %in% exempt,  
                       rep(1:n(), each = 2L, length.out = n()),
                       if_else(names %in% exempt, 666L, NA_integer_))) %>% 
  tidyr::fill(grp)




# A tibble: 18 x 2
   value     grp
   <chr>   <int>
 1 bad         1
 2 good        1
 3 1           1
 4 2           1
 5 good        3
 6 bad         3
 7 111         3
 8 bad         4
 9 J.James     5
10 good        5
11 J.James     6
12 333         6
13 J.James     7
14 good        7
15 761         7
16 Veni      666
17 vidi      666
18 Vici      666


这篇关于每对的特殊分组号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆