R中的dplyr将特定的列对成行 [英] Gathering specific pairs of columns into rows by dplyr in R

查看:72
本文介绍了R中的dplyr将特定的列对成行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过收集特定的几对列来将数据帧从宽格式转换为长格式,如下所示:

I am trying to convert a data frame from wide to long format by gathering specific pairs of columns of which example is shown below:

数据帧示例

df <- data.frame(id=c(1,2,3,4,5), var=c("a","d","g","f","i"),a1=c(3,5,1,2,2), b1=c(2,4,1,2,3), a2=c(8,1,2,5,1), b2=c(1,6,4,7,2), a3=c(7,7,2,3,1), b3=c(1,1,4,9,6))

初始表格:

  id var a1 b1 a2 b2 a3 b3
1  1   a  3  2  8  1  7  1
2  2   d  5  4  1  6  7  1
3  3   g  1  1  2  4  2  4
4  4   f  2  2  5  7  3  9
5  5   i  2  3  1  2  1  6

所需结果:

   id  var a  b
 1  1   a  3  2
 2  1   a  8  1
 3  1   a  7  1
 4  2   d  5  4
 5  2   d  1  6
 6  2   d  7  1
 7  3   g  1  1
 8  3   g  2  4
 9  3   g  2  4
10  4   f  2  2
11  4   f  5  7
12  4   f  3  9
13  5   i  2  3
14  5   i  1  2
15  5   i  1  6

条件:

  • 应该收集成对的ai和bi:由于存在3对a和b,即"a1和b1","a2和b2"以及"a3和b3",因此应将这些对中的值移动到对中通过重复复制每条记录三遍来完成"a和b"的操作
  • 第一和第二字段(每个样本的ID及其公共变量)应保留在每个重复的行中

我本以为可以用tidyverse中的gather()来实现,但是据我所知,我认为collect函数可能不适合将这样的特定字段对收集到特定的多列中(两个列).

I was thinking that it is possible to make it by gather() in tidyverse, however, as far as I understand, I suppose that gather function may not be suitable for gathering such specific pairs of fields into specific multiple columns (two columns in this case).

可以单独准备三个数据帧并将其绑定为一个(如下所示的示例脚本),但是我宁愿在tidyverse中以连续的管道操作使其一次,也不要停止操纵.

It is possible to make it to prepare three data frames separately and binding it into one (example scripts are shown below), however I prefer to make it in one continuous pipe operation in tidyverse not to stop manipulation.

df1 <- df %>% dplyr::select(id,var,a1,b1)
df2 <- df %>% dplyr::select(id,var,a2,b2)
df3 <- df %>% dplyr::select(id,var,a3,b3)
df.fin <- bind_rows(df1,df2,df3)

感谢您使用tidyverse的优雅建议.

I would appreciate your elegant suggestons using tidyverse.

=================其他问题================

=================Additional Questions==================

@Akrun&卡米尔 感谢您的建议,也很抱歉我的回复很晚.我现在正在尝试将您的想法应用到实际的数据框架中,但仍在努力解决另一个问题.

@Akrun & Camille Thank you for your suggestions and sorry for my late reply. I am now trying to apply your idea into actual data frame but still struggling with another issue.

以下是实际数据框中的列名(对不起,我没有为每列设置任何值,因为这可能无关紧要).

Followings are column names in actual data frame (sorry, I do not set any values of each columns as it may not be a matter).

colnames(df) <- c("hid","mid","rel","age","gen","mlic","vlic",
                  "wtaz","staz","ocp","ocpot","emp","empot","expm",
                  "minc","otaz1","op1","dtime1","atime1","dp1","dtaz1",
                  "pur1", "repm1","lg1t1","lg2t1","lg3t1","lg4t1","expt1",
                  "otaz2","op2","dtime2","atime2","dp2","dtaz2","pur2",
                  "repm2","lg1t2","lg2t2","lg3t2","lg4t2","expt2",
                  "otaz3","op3","dtime3","atime3","dp3","dtaz3","pur3",
                  "repm3","lg1t3","lg2t3","lg3t3","lg4t3","expt3",
                  "otaz4","op4","dtime4","atime4","dp4","dtaz4","pur4",
                  "repm4","lg1t4","lg2t4","lg3t4","lg4t4","expt4",
                  "otaz5","op5","dtime5","atime5","dp5","dtaz5","pur5",
                  "repm5","lg1t5","lg2t5","lg3t5","lg4t5","expt5"
                  )

然后,我尝试将您的建议应用如下: 在数据框中,1:15列是common变量,其他列是具有5个重复的重复变量(每个变量的末尾有1到5个重复).我可以运行以下脚本,但是仍然有问题:

Then, I am trying to apply your suggestions as below: In the data frame, columns 1:15 are commons variables and others are repeated variables with 5 repetitions (1 to 5 located at the end of each varible). I could rund following script but still have problem:

#### Convert member table into activity table
## Common variables
hm.com <- names(hm)[c(1:15)]
## Repeating variables
hm.rep <- names(hm)[c(-1:-15)]
hm.rename <- unique(sub("\\d+$","",hm.rep))
## Extract members with trips
hm.trip <- hm %>% filter(otaz!=0) %>% data.frame()
## Convert from member into trip table
test <- split(hm.rep, sub(".*[^1-9$]", "", hm.rep)) %>%
    map_df(~ hm.trip %>% dplyr::select(hm.com, .x)) %>% 
    rename_at(16:28, ~ hm.rename) %>%
    arrange(hid,mid,dtime,atime) %>%
    data.frame()

结果仍然存在问题:

我可以重命名第一组重复变量,但是从2到5的其余字段仍然保留,并且记录没有适当地存储在数据框中. 我的意思是,一组重复的变量(例如,从otaz2到expt2)不存储在otaz_expt的第二行中,而是存储在其原始位置(从otaz2到expt2).我认为map_df在我的情况下无法正常工作.

I could rename first set of repeated variables, however remaining fields from 2 to 5 are still remaining and records are not appropriately stored in the data frame. I mean that, a set of repeated variables, for instance, from otaz2 to expt2, are stored not in the second row of otaz~expt but stored in its original position (from otaz2 to expt2). I suppose map_df is not working correctly in my case.

==========解决的问题========== 上面的脚本包含不正确的操作:

========== Problem Solved ========== Above script was containing incorrect manipulation:

错误:

map_df(~ hm.trip %>% dplyr::select(hm.com, .x)) %>% 
        rename_at(16:28, ~ hm.rename)

正确:

map_df(~ hm.trip %>% dplyr::select(hm.com, .x) %>% 
        rename_at(16:28, ~ hm.rename))

谢谢,我可以继续下一步.

Thank you, I could go to the next step.

推荐答案

这不是非常可扩展的,因此,如果您最终需要的多于这三对列,请使用@akrun的答案.我只想指出,您包含的bind_rows代码段实际上可以在一个管道中完成:

This isn't very scaleable, so if you end up needing more than these 3 pairs of columns, go with @akrun's answer. I just wanted to point out that the bind_rows snippet you included could, in fact, be done in one pipe:

library(tidyverse)


bind_rows(
        df %>% select(id, var, a = a1, b = b1),
        df %>% select(id, var, a = a2, b = b2),
        df %>% select(id, var, a = a3, b = b3)
    ) %>%
    arrange(id, var)
#>    id var a b
#> 1   1   a 3 2
#> 2   1   a 8 1
#> 3   1   a 7 1
#> 4   2   d 5 4
#> 5   2   d 1 6
#> 6   2   d 7 1
#> 7   3   g 1 1
#> 8   3   g 2 4
#> 9   3   g 2 4
#> 10  4   f 2 2
#> 11  4   f 5 7
#> 12  4   f 3 9
#> 13  5   i 2 3
#> 14  5   i 1 2
#> 15  5   i 1 6

reprex程序包(v0.2.0)创建于2018-05-07.

Created on 2018-05-07 by the reprex package (v0.2.0).

如果您想要一些可扩展的东西并且喜欢map_*函数(来自tidyverse中的purrr),则可以抽象上述管道:

If you want something that scales and you like map_* functions (from purrr in the tidyverse), you can abstract the above pipeline:

1:3 %>%
    map_df(~select(df, id, var, ends_with(as.character(.))) %>% 
                    setNames(c("id", "var", "a", "b"))) %>%
    arrange(id, var)

其中1:3仅代表您所拥有的货币对的数量.

where 1:3 just represents the numbers of the pairs you have.

这篇关于R中的dplyr将特定的列对成行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆