如何在R中按dplyr/tidyverse将分组的行复制到列中? [英] How to copy grouped rows into column by dplyr/tidyverse in R?

查看:83
本文介绍了如何在R中按dplyr/tidyverse将分组的行复制到列中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用dplyr将行集复制到列中.以下是我的数据框.

I am trying to copy sets of rows into columns using dplyr. Following is my data frame.

df <- data.frame(
    hid=c(1,1,1,1,2,2,2,2,2,3,3,3,3),
    mid=c(1,2,3,4,1,2,3,4,5,1,2,3,4),
    tmid=c("010","01010","010","01020",
           "010","0120","010","010","020",
           "010","01010","010","01020"),
    thid=c("010","02020","010","02020",
           "000","0120","010","010","010",
           "010","02020","010","02020"),
    )

它以以下格式打印:

> df
   hid mid  tmid  thid
1    1   1   010   010
2    1   2 01010 02020
3    1   3   010   010
4    1   4 01020 02020
5    2   1   010   000
6    2   2  0120  0120
7    2   3   010   010
8    2   4   010   010
9    2   5   020   010
10   3   1   010   010
11   3   2 01010 02020
12   3   3   010   010
13   3   4 01020 02020

我想要的输出如下所示:

My desired output is show below:

     hid   mid  tmid   thid  tmid1  tmid2  tmid3  tmid4  tmid5  thid1  thid2  thid3  thid4  thid5
 * <dbl> <dbl> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> 
 1     1     1   010    010    010  01010    010  01020      0    010  02020    010  02020      0
 2     1     2 01010  02020    010  01010    010  01020      0    010  02020    010  02020      0
 3     1     3   010    010    010  01010    010  01020      0    010  02020    010  02020      0
 4     1     4 01020  02020    010  01010    010  01020      0    010  02020    010  02020      0
 5     2     1   010    000    010  0120     010    010    020    000   0120    010    010    010
 6     2     2  0120   0120    010  0120     010    010    020    000   0120    010    010    010
 7     2     3   010    010    010  0120     010    010    020    000   0120    010    010    010
 8     2     4   010    010    010  0120     010    010    020    000   0120    010    010    010
 9     2     5   020    010    010  0120     010    010    020    000   0120    010    010    010
10     3     1   010    010    010  01010    010  01020      0    010  02020    010   02020     0
11     3     2 01010  02020    010  01010    010  01020      0    010  02020    010   02020     0
12     3     3   010    010    010  01010    010  01020      0    010  02020    010   02020     0
13     3     4 01020  02020    010  01010    010  01020      0    010  02020    010   02020     0

  • thidtmid转换为列
  • thid_xtmid_x中的后缀由mid定义;但是,mid的最大数量是不可扩展的(在实际的大数据集中,它从1扩展到大约8)
  • thid_xtmid_x的相同值由hid的组设置
  • 如果值不存在,则应使用0
  • 对其进行填充

    • Converting thid and tmid into column
    • Suffix in thid_x and tmid_xis defined by mid; however, maximum number of mid is not scalable (it spreads from 1 to perhaps 8 in actual large data set)
    • Same values of thid_x and tmid_xare set by groups of hid
    • If value does not exist, it should be padded by 0
    • 此操作的想法如下图所示.

      Idea of this manipulation is shown in the following figure.

      我当前正在尝试使用spread,但是它返回特定的midthidtmid对.我需要用保留在hid分组的输出中的值来填充剩余的<NA>s.

      I am currently trying to use spread but it returns specific pairs of mid and thid or tmid. I need to fill remaining <NA>s by a value which remains in the output grouped by hid.

      > df %>% mutate(id1=str_c("tmid",mid)) %>% group_by(hid) %>% spread(key=id1,value=tmid)
      # A tibble: 13 x 8
      # Groups:   hid [3]
           hid   mid   thid  tmid1  tmid2  tmid3  tmid4  tmid5
       * <dbl> <dbl> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr>
       1     1     1    010    010   <NA>   <NA>   <NA>   <NA>
       2     1     2  02020   <NA>  01010   <NA>   <NA>   <NA>
       3     1     3    010   <NA>   <NA>    010   <NA>   <NA>
       4     1     4  02020   <NA>   <NA>   <NA>  01020   <NA>
       5     2     1    000    010   <NA>   <NA>   <NA>   <NA>
       6     2     2   0120   <NA>   0120   <NA>   <NA>   <NA>
       7     2     3    010   <NA>   <NA>    010   <NA>   <NA>
       8     2     4    010   <NA>   <NA>   <NA>    010   <NA>
       9     2     5    010   <NA>   <NA>   <NA>   <NA>    020
      10     3     1    010    010   <NA>   <NA>   <NA>   <NA>
      11     3     2  02020   <NA>  01010   <NA>   <NA>   <NA>
      12     3     3    010   <NA>   <NA>    010   <NA>   <NA>
      13     3     4  02020   <NA>   <NA>   <NA>  01020   <NA>
      

      有什么建议吗?

      推荐答案

      我们可以gather然后执行spread

      library(tidyverse)
      df1 %>% 
        select(-tdid, -tiid) %>% 
        gather(key, val, tmid:thid) %>% 
        unite(keyn, key, mid, sep="")  %>%
        spread(keyn, val, fill = '0') %>% 
        right_join(df1) %>%
        select(names(df1), everything(), -tdid, -tiid)
      # A tibble: 13 x 14
      #     hid   mid tmid  thid  thid1 thid2 thid3 thid4 thid5 tmid1 tmid2 tmid3
      #   <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
      # 1     1     1 010   010   010   02020 010   02020 0     010   01010 010  
      # 2     1     2 01010 02020 010   02020 010   02020 0     010   01010 010  
      # 3     1     3 010   010   010   02020 010   02020 0     010   01010 010  
      # 4     1     4 01020 02020 010   02020 010   02020 0     010   01010 010  
      # 5     2     1 010   000   000   0120  010   010   010   010   0120  010  
      # 6     2     2 0120  0120  000   0120  010   010   010   010   0120  010  
      # 7     2     3 010   010   000   0120  010   010   010   010   0120  010  
      # 8     2     4 010   010   000   0120  010   010   010   010   0120  010  
      # 9     2     5 020   010   000   0120  010   010   010   010   0120  010  
      #10     3     1 010   010   010   02020 010   02020 0     010   01010 010  
      #11     3     2 01010 02020 010   02020 010   02020 0     010   01010 010  
      #12     3     3 010   010   010   02020 010   02020 0     010   01010 010  
      #13     3     4 01020 02020 010   02020 010   02020 0     010   01010 010  
      # ... with 2 more variables: tmid4 <chr>, tmid5 <chr>
      

      数据

      df1 <- structure(list(hid = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3), 
          mid = c(1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3, 4), tmid = c("010", 
          "01010", "010", "01020", "010", "0120", "010", "010", "020", 
          "010", "01010", "010", "01020"), thid = c("010", "02020", 
          "010", "02020", "000", "0120", "010", "010", "010", "010", 
          "02020", "010", "02020"), tdid = c("000", "01010", "010", 
          "02020", "000", "0100", "010", "010", "010", "000", "01010", 
          "010", "02020"), tiid = c("010", "02020", "010", "01020", 
          "020", "0220", "020", "020", "020", "010", "02020", "010", 
          "01020")), .Names = c("hid", "mid", "tmid", "thid", "tdid", 
      "tiid"), row.names = c(NA, -13L), class = c("tbl_df", "tbl", 
      "data.frame"))
      

      这篇关于如何在R中按dplyr/tidyverse将分组的行复制到列中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆