在 R 中收集多组列 [英] Gather multiple groups of columns in R

查看:27
本文介绍了在 R 中收集多组列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很宽的数据框,我需要将其收集或融合到一个高大的数据框中.我坚持的部分是我有几组需要保持关联/分组的列.

I have a wide dataframe that I need to gather or melt into a tall dataframe. The part that I'm stuck on is that I have groups of columns that need to remain associated/grouped.

对于每个表单提交,我有 2 个用户,每个用户有 3 列数据.我想取这 6 列并将它们基本上以 3 为一组堆叠,以便每个用户都是一个单独的观察.

I have 2 users for each form submission and 3 columns of data for each user. I'd like to take these 6 columns and essentially stack them in groups of 3 so that each user is a separate observation.

这是我的数据或多或少的样例:

This is a sample of more or less what my data looks like:

wide <- data.frame(
    form.ID     = c(1, 2), 
    entry.date  = c("2016-07-01", "2016-06-15"), 
    user.1      = c("Joe", "Sam"), 
    user.1.ID   = c("A1", "A2"), 
    user.1.data = c("foo", "lorem"),
    user.2      = c("Jane", "Sue"), 
    user.2.ID   = c("B1", "B2"),
    user.2.data = c("bar", "ipsum")
)

wide
#   form.ID entry.date user.1 user.1.ID user.1.data user.2 user.2.ID user.2.data
# 1       1 2016-07-01    Joe        A1         foo   Jane        B1         bar
# 2       2 2016-06-15    Sam        A2       lorem    Sue        B2       ipsum

这是理想的最终状态:

#   form.ID  entry.date   user   user.ID   user.data
# 1       1  2016-07-01    Joe        A1         foo
# 1       1  2016-07-01   Jane        B1         bar
# 2       2  2016-06-15    Sam        A2       lorem    
# 2       2  2016-06-15    Sue        B2       ipsum

我发现了这个问题,但在我的情况下我无法得到给定的答案:

I found this question, but I can't get the given answers to work in my case:

收集多组列

我试过了:

tall.almost <- gather(wide, user.n, user.name, user.1, user.2)
tall.almost
#   form.ID entry.date user.1.ID user.1.data user.2.ID user.2.data user.n user.name
# 1       1 2016-07-01        A1         foo        B1         bar user.1       Joe
# 2       2 2016-06-15        A2       ipsum        B2       lorem user.1       Sam
# 3       1 2016-07-01        A1         foo        B1         bar user.2      Jane
# 4       2 2016-06-15        A2       ipsum        B2       lorem user.2       Sue

我想使用像上面那样的一系列 gather() 函数,但我得到了重复的数据.

I thought to use a sequence of gather() functions like the one above, but I get a duplicate data.

我试过了:

tall.not.quite <- gather(wide, user.n, user.name, -form.ID, -date)
tall.not.quite
   form.ID entry.date      user.n user.name
1        1 2016-07-01      user.1       Joe
2        2 2016-06-15      user.1       Sam
3        1 2016-07-01   user.1.ID        A1
4        2 2016-06-15   user.1.ID        A2
5        1 2016-07-01 user.1.data       foo
6        2 2016-06-15 user.1.data     ipsum
7        1 2016-07-01      user.2      Jane
8        2 2016-06-15      user.2       Sue
9        1 2016-07-01   user.2.ID        B1
10       2 2016-06-15   user.2.ID        B2
11       1 2016-07-01 user.2.data       bar
12       2 2016-06-15 user.2.data     lorem

我想我可以使用 spread() 来提取 user.n.ID 和 user.n.data 字段,但我也不能让它工作.我最终回到了我开始的地方.

thinking I could then use spread() to pull out the user.n.ID and user.n.data fields, but I can't get that to work either. I end up back where I started.

我很好并且被卡住了.这位 R 新手将非常感谢您的帮助.

I'm pretty good and stuck. This R newby would really appreciate any help.

谢谢!

推荐答案

我们可以使用data.table中的melt,它可以采取多个measure> 列.

We can use melt from data.table which can take multiple measure columns.

library(data.table)
melt(setDT(wide), measure = patterns("\\d+$", "user.*ID$", "data$"),
   value.name = c("user", "user.ID", "user.data"))[,
    variable:= NULL][order(form.ID)]
#     form.ID entry.date user user.ID user.data
# 1:       1 2016-07-01  Joe      A1       foo
# 2:       1 2016-07-01 Jane      B1       bar
# 3:       2 2016-06-15  Sam      A2     lorem
# 4:       2 2016-06-15  Sue      B2     ipsum

这篇关于在 R 中收集多组列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆