每对的特殊分组号 [英] Special grouping number for each pairs
问题描述
这里已经回答了部分问题 special-group每个数据组合的数量。在大多数情况下,我们在数据内部具有对和其他数据值。我们要实现的是,如果存在这些对,则对那些组进行编号,并将它们编号直到下一对。
There is already some part of the question answered here special-group-number-for-each-combination-of-data. In most cases we have pairs and other data values inside the data. What we want to achieve is that number those groups if those pairs exist and number them until the next pairs.
当我集中每对时,例如 c( bad, good)
希望将它们分组并配对 c('Veni', vidi, Vici)
分配唯一编号 666
。
As I concentrated each pairs such as c("bad","good")
would like to group them and for pairs c('Veni',"vidi","Vici")
assign unique number 666
.
以下是示例数据
names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")
df <- data.frame(names)
这是真实情况和一般情况下的预期输出
Here is the real and general case expected output
names Group
1 bad 1
2 good 1
3 1 1
4 2 1
5 good 2
6 bad 2
7 111 2
8 bad 3
9 J.James 3
10 good 4
11 J.James 4
12 333 4
13 J.James 5
14 good 5
15 761 5
16 Veni 666
17 vidi 666
18 Vici 666
推荐答案
这里有两种方法可以为给定的样本数据集重现OP的预期结果。
Here are two approaches which reproduce OP's expected result for the given sample dataset.`
两者的工作方式相同。首先,将跳过所有令人烦扰的行,即不包含有效名称的行,并以2组为单位对具有有效名称的行进行简单编号。其次,为具有免除名称的行指定特殊的组数。最后, NA
行通过进行最后一个观察来填充。
Both work in the same way. First, all "disturbing" rows, i.e., rows which do not contain "valid" names, are skipped and the rows with "valid" names are simply numbered in groups of 2. Second, the rows with exempt names are given the special group number. Finally, the NA
rows are filled by carrying the last observation forward.
library(data.table)
names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")
exempt <- c("Veni", "vidi", "Vici")
data.table(names)[is.na(as.numeric(names)) & !names %in% exempt,
grp := rep(1:.N, each = 2L, length.out = .N)][
names %in% exempt, grp := 666L][
, grp := zoo::na.locf(grp)][]
names grp
1: bad 1
2: good 1
3: 1 1
4: 2 1
5: good 2
6: bad 2
7: 111 2
8: bad 3
9: J.James 3
10: good 4
11: J.James 4
12: 333 4
13: J.James 5
14: good 5
15: 761 5
16: Veni 666
17: vidi 666
18: Vici 666
dplyr
/ tidyr
我尝试提供 dplyr
/ tidyr
解决方案:
dplyr
/tidyr
Here is my attempt to provide a dplyr
/tidyr
solution:
library(dplyr)
as_tibble(names) %>%
mutate(grp = if_else(is.na(as.numeric(names)) & !names %in% exempt,
rep(1:n(), each = 2L, length.out = n()),
if_else(names %in% exempt, 666L, NA_integer_))) %>%
tidyr::fill(grp)
# A tibble: 18 x 2
value grp
<chr> <int>
1 bad 1
2 good 1
3 1 1
4 2 1
5 good 3
6 bad 3
7 111 3
8 bad 4
9 J.James 5
10 good 5
11 J.James 6
12 333 6
13 J.James 7
14 good 7
15 761 7
16 Veni 666
17 vidi 666
18 Vici 666
这篇关于每对的特殊分组号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!