如何在R中按dplyr/tidyverse将分组的行复制到列中? [英] How to copy grouped rows into column by dplyr/tidyverse in R?

查看：83 发布时间：2020/7/8 20:21:21 r dplyr type-conversion tidyverse spread

本文介绍了如何在R中按dplyr/tidyverse将分组的行复制到列中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用dplyr将行集复制到列中.以下是我的数据框.

I am trying to copy sets of rows into columns using dplyr. Following is my data frame.

df <- data.frame(
    hid=c(1,1,1,1,2,2,2,2,2,3,3,3,3),
    mid=c(1,2,3,4,1,2,3,4,5,1,2,3,4),
    tmid=c("010","01010","010","01020",
           "010","0120","010","010","020",
           "010","01010","010","01020"),
    thid=c("010","02020","010","02020",
           "000","0120","010","010","010",
           "010","02020","010","02020"),
    )

它以以下格式打印:

> df
   hid mid  tmid  thid
1    1   1   010   010
2    1   2 01010 02020
3    1   3   010   010
4    1   4 01020 02020
5    2   1   010   000
6    2   2  0120  0120
7    2   3   010   010
8    2   4   010   010
9    2   5   020   010
10   3   1   010   010
11   3   2 01010 02020
12   3   3   010   010
13   3   4 01020 02020

我想要的输出如下所示:

My desired output is show below:

     hid   mid  tmid   thid  tmid1  tmid2  tmid3  tmid4  tmid5  thid1  thid2  thid3  thid4  thid5
 * <dbl> <dbl> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> 
 1     1     1   010    010    010  01010    010  01020      0    010  02020    010  02020      0
 2     1     2 01010  02020    010  01010    010  01020      0    010  02020    010  02020      0
 3     1     3   010    010    010  01010    010  01020      0    010  02020    010  02020      0
 4     1     4 01020  02020    010  01010    010  01020      0    010  02020    010  02020      0
 5     2     1   010    000    010  0120     010    010    020    000   0120    010    010    010
 6     2     2  0120   0120    010  0120     010    010    020    000   0120    010    010    010
 7     2     3   010    010    010  0120     010    010    020    000   0120    010    010    010
 8     2     4   010    010    010  0120     010    010    020    000   0120    010    010    010
 9     2     5   020    010    010  0120     010    010    020    000   0120    010    010    010
10     3     1   010    010    010  01010    010  01020      0    010  02020    010   02020     0
11     3     2 01010  02020    010  01010    010  01020      0    010  02020    010   02020     0
12     3     3   010    010    010  01010    010  01020      0    010  02020    010   02020     0
13     3     4 01020  02020    010  01010    010  01020      0    010  02020    010   02020     0

将thid和tmid转换为列

thid_x

tmid_x

mid

thid_x和tmid_x的相同值由hid的组设置
如果值不存在，则应使用0

Converting thid and tmid into column
Suffix in thid_x and tmid_xis defined by mid; however, maximum number of mid is not scalable (it spreads from 1 to perhaps 8 in actual large data set)
Same values of thid_x and tmid_xare set by groups of hid
If value does not exist, it should be padded by 0

此操作的想法如下图所示.

Idea of this manipulation is shown in the following figure.

我当前正在尝试使用spread，但是它返回特定的mid和thid或tmid对.我需要用保留在hid分组的输出中的值来填充剩余的<NA>s.

I am currently trying to use spread but it returns specific pairs of mid and thid or tmid. I need to fill remaining <NA>s by a value which remains in the output grouped by hid.

> df %>% mutate(id1=str_c("tmid",mid)) %>% group_by(hid) %>% spread(key=id1,value=tmid)
# A tibble: 13 x 8
# Groups:   hid [3]
     hid   mid   thid  tmid1  tmid2  tmid3  tmid4  tmid5
 * <dbl> <dbl> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr>
 1     1     1    010    010   <NA>   <NA>   <NA>   <NA>
 2     1     2  02020   <NA>  01010   <NA>   <NA>   <NA>
 3     1     3    010   <NA>   <NA>    010   <NA>   <NA>
 4     1     4  02020   <NA>   <NA>   <NA>  01020   <NA>
 5     2     1    000    010   <NA>   <NA>   <NA>   <NA>
 6     2     2   0120   <NA>   0120   <NA>   <NA>   <NA>
 7     2     3    010   <NA>   <NA>    010   <NA>   <NA>
 8     2     4    010   <NA>   <NA>   <NA>    010   <NA>
 9     2     5    010   <NA>   <NA>   <NA>   <NA>    020
10     3     1    010    010   <NA>   <NA>   <NA>   <NA>
11     3     2  02020   <NA>  01010   <NA>   <NA>   <NA>
12     3     3    010   <NA>   <NA>    010   <NA>   <NA>
13     3     4  02020   <NA>   <NA>   <NA>  01020   <NA>

有什么建议吗?

推荐答案

我们可以gather然后执行spread

library(tidyverse)
df1 %>% 
  select(-tdid, -tiid) %>% 
  gather(key, val, tmid:thid) %>% 
  unite(keyn, key, mid, sep="")  %>%
  spread(keyn, val, fill = '0') %>% 
  right_join(df1) %>%
  select(names(df1), everything(), -tdid, -tiid)
# A tibble: 13 x 14
#     hid   mid tmid  thid  thid1 thid2 thid3 thid4 thid5 tmid1 tmid2 tmid3
#   <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1     1     1 010   010   010   02020 010   02020 0     010   01010 010  
# 2     1     2 01010 02020 010   02020 010   02020 0     010   01010 010  
# 3     1     3 010   010   010   02020 010   02020 0     010   01010 010  
# 4     1     4 01020 02020 010   02020 010   02020 0     010   01010 010  
# 5     2     1 010   000   000   0120  010   010   010   010   0120  010  
# 6     2     2 0120  0120  000   0120  010   010   010   010   0120  010  
# 7     2     3 010   010   000   0120  010   010   010   010   0120  010  
# 8     2     4 010   010   000   0120  010   010   010   010   0120  010  
# 9     2     5 020   010   000   0120  010   010   010   010   0120  010  
#10     3     1 010   010   010   02020 010   02020 0     010   01010 010  
#11     3     2 01010 02020 010   02020 010   02020 0     010   01010 010  
#12     3     3 010   010   010   02020 010   02020 0     010   01010 010  
#13     3     4 01020 02020 010   02020 010   02020 0     010   01010 010  
# ... with 2 more variables: tmid4 <chr>, tmid5 <chr>

数据

df1 <- structure(list(hid = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3), 
    mid = c(1, 2, 3, 4, 1, 2, 3, 4, 5, 1, 2, 3, 4), tmid = c("010", 
    "01010", "010", "01020", "010", "0120", "010", "010", "020", 
    "010", "01010", "010", "01020"), thid = c("010", "02020", 
    "010", "02020", "000", "0120", "010", "010", "010", "010", 
    "02020", "010", "02020"), tdid = c("000", "01010", "010", 
    "02020", "000", "0100", "010", "010", "010", "000", "01010", 
    "010", "02020"), tiid = c("010", "02020", "010", "01020", 
    "020", "0220", "020", "020", "020", "010", "02020", "010", 
    "01020")), .Names = c("hid", "mid", "tmid", "thid", "tdid", 
"tiid"), row.names = c(NA, -13L), class = c("tbl_df", "tbl", 
"data.frame"))

这篇关于如何在R中按dplyr/tidyverse将分组的行复制到列中?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在R中按dplyr/tidyverse将分组的行复制到列中? [英] How to copy grouped rows into column by dplyr/tidyverse in R?

问题描述

推荐答案

数据

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在R中按dplyr/tidyverse将分组的行复制到列中? [英] How to copy grouped rows into column by dplyr/tidyverse in R?

问题描述

推荐答案

数据

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭