通过dplyr在组内应用rep() [英] Applying rep() within groups through dplyr

查看:54
本文介绍了通过dplyr在组内应用rep()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试在组内复制1和2的二进制输出.我想使用 rep dplyr ,但是我似乎不明白如何在组中应用 rep .通过手动分离分组并为每个组指定正确的范围,我已经能够做到这一点.我想知道如何使用 dplyr 应用 rep .

I've been trying to replicate a binary output of 1 and 2 within groups. I'd like to make use of rep and dplyr, but I can't seem to understand how to apply rep within groups. I've been able to do it by manually separating the groupings and specifying the correct range per group. I would like to know how repcould be applied using dplyr.

这是示例数据.

df <- data.frame(date = c("2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-01", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02", "2017-01-02"),
                 loc =c("AB", "AB", "AB", "AB", "AB", "AB", "AB", "AB", "CD", "CD", "CD", "CD", "CD", "CD", "CD", "CD", "CD", "CD"),
                 cat = c("a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c", "d", "d", "d", "d", "d"))

基本上,这是我对整个数据集进行分组时运行的代码.

This is basically the code I run per grouping applied on the entire dataset.

df$type <- rep(1:2,nrow(df)/2)

如您所见,输出忽略了 cat 列. cat b&d 应该从1开始.

As you can see, the output disregards the column cat. cat b & d should have started at 1.

         date loc cat type
1  2017-01-01  AB   a    1
2  2017-01-01  AB   a    2
3  2017-01-01  AB   a    1
4  2017-01-01  AB   b    2
5  2017-01-01  AB   b    1
6  2017-01-01  AB   b    2
7  2017-01-01  AB   b    1
8  2017-01-02  AB   b    2
9  2017-01-02  CD   c    1
10 2017-01-02  CD   c    2
11 2017-01-02  CD   c    1
12 2017-01-02  CD   c    2
13 2017-01-02  CD   c    1
14 2017-01-02  CD   d    2
15 2017-01-02  CD   d    1
16 2017-01-02  CD   d    2
17 2017-01-02  CD   d    1

更新:这是所需的输出.

UPDATE: Here's the desired output.

        date loc cat type
1  2017-01-01  AB   a    1
2  2017-01-01  AB   a    2
3  2017-01-01  AB   a    1
4  2017-01-01  AB   b    1
5  2017-01-01  AB   b    2
6  2017-01-01  AB   b    1
7  2017-01-01  AB   b    2
8  2017-01-02  AB   b    1
9  2017-01-02  CD   c    1
10 2017-01-02  CD   c    2
11 2017-01-02  CD   c    1
12 2017-01-02  CD   c    2
13 2017-01-02  CD   c    1
14 2017-01-02  CD   d    1
15 2017-01-02  CD   d    2
16 2017-01-02  CD   d    1
17 2017-01-02  CD   d    2

推荐答案

假设 cat 是此处唯一相关的分组变量(而不是日期和位置),则可以执行以下操作:

Assuming that cat is the only relevant grouping variable here (not date and loc), you can do:

library(dplyr)
df = df %>%
    group_by(cat) %>%
    mutate(type = rep(1:2, length.out = length(cat)))
# Output:
         date    loc    cat  type
       <fctr> <fctr> <fctr> <int>
1  2017-01-01     AB      a     1
2  2017-01-01     AB      a     2
3  2017-01-01     AB      a     1
4  2017-01-01     AB      b     1
5  2017-01-01     AB      b     2
6  2017-01-01     AB      b     1
7  2017-01-01     AB      b     2
8  2017-01-02     AB      b     1
9  2017-01-02     CD      c     1
10 2017-01-02     CD      c     2
11 2017-01-02     CD      c     1
12 2017-01-02     CD      c     2
13 2017-01-02     CD      c     1
14 2017-01-02     CD      d     1
15 2017-01-02     CD      d     2
16 2017-01-02     CD      d     1
17 2017-01-02     CD      d     2
18 2017-01-02     CD      d     1

这篇关于通过dplyr在组内应用rep()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆