通过dplyr在组内应用rep() [英] Applying rep() within groups through dplyr
问题描述
我一直在尝试在组内复制1和2的二进制输出.我想使用 rep
和 dplyr
,但是我似乎不明白如何在组中应用 rep
.通过手动分离分组并为每个组指定正确的范围,我已经能够做到这一点.我想知道如何使用 dplyr
应用 rep
.
I've been trying to replicate a binary output of 1 and 2 within groups.
I'd like to make use of rep
and dplyr
, but I can't seem to understand how to apply rep
within groups. I've been able to do it by manually separating the groupings and specifying the correct range per group. I would like to know how rep
could be applied using dplyr
.
这是示例数据.
df <- data.frame(date = c("2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02"),
loc =c("AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "CD", "CD", "CD", "CD", "CD", "CD", "CD", "CD", "CD", "CD"),
cat = c("a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c", "d", "d", "d", "d", "d"))
基本上,这是我对整个数据集进行分组时运行的代码.
This is basically the code I run per grouping applied on the entire dataset.
df$type <- rep(1:2,nrow(df)/2)
如您所见,输出忽略了 cat
列. cat b&d
应该从1开始.
As you can see, the output disregards the column cat
. cat b & d
should have started at 1.
date loc cat type
1 2017-01-01 AB a 1
2 2017-01-01 AB a 2
3 2017-01-01 AB a 1
4 2017-01-01 AB b 2
5 2017-01-01 AB b 1
6 2017-01-01 AB b 2
7 2017-01-01 AB b 1
8 2017-01-02 AB b 2
9 2017-01-02 CD c 1
10 2017-01-02 CD c 2
11 2017-01-02 CD c 1
12 2017-01-02 CD c 2
13 2017-01-02 CD c 1
14 2017-01-02 CD d 2
15 2017-01-02 CD d 1
16 2017-01-02 CD d 2
17 2017-01-02 CD d 1
更新:这是所需的输出.
UPDATE: Here's the desired output.
date loc cat type
1 2017-01-01 AB a 1
2 2017-01-01 AB a 2
3 2017-01-01 AB a 1
4 2017-01-01 AB b 1
5 2017-01-01 AB b 2
6 2017-01-01 AB b 1
7 2017-01-01 AB b 2
8 2017-01-02 AB b 1
9 2017-01-02 CD c 1
10 2017-01-02 CD c 2
11 2017-01-02 CD c 1
12 2017-01-02 CD c 2
13 2017-01-02 CD c 1
14 2017-01-02 CD d 1
15 2017-01-02 CD d 2
16 2017-01-02 CD d 1
17 2017-01-02 CD d 2
推荐答案
假设 cat
是此处唯一相关的分组变量(而不是日期和位置),则可以执行以下操作:
Assuming that cat
is the only relevant grouping variable here (not date and loc), you can do:
library(dplyr)
df = df %>%
group_by(cat) %>%
mutate(type = rep(1:2, length.out = length(cat)))
# Output:
date loc cat type
<fctr> <fctr> <fctr> <int>
1 2017-01-01 AB a 1
2 2017-01-01 AB a 2
3 2017-01-01 AB a 1
4 2017-01-01 AB b 1
5 2017-01-01 AB b 2
6 2017-01-01 AB b 1
7 2017-01-01 AB b 2
8 2017-01-02 AB b 1
9 2017-01-02 CD c 1
10 2017-01-02 CD c 2
11 2017-01-02 CD c 1
12 2017-01-02 CD c 2
13 2017-01-02 CD c 1
14 2017-01-02 CD d 1
15 2017-01-02 CD d 2
16 2017-01-02 CD d 1
17 2017-01-02 CD d 2
18 2017-01-02 CD d 1
这篇关于通过dplyr在组内应用rep()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!