R:将宽格式转换为具有多个3个时间段变量的长格式 [英] R: Converting wide format to long format with multiple 3 time period variables

查看:79
本文介绍了R:将宽格式转换为具有多个3个时间段变量的长格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很抱歉,如果这是一个简单的问题,但是搜索后我一直找不到简单的解决方案。我刚接触R,遇到麻烦,无法使用melt(reshape2)或collect(tidyr)函数将宽格式转换为长格式。我正在使用的数据集包含22个不同的时间变量,每个时间段均为3个时间段。当我尝试一次将所有这些从宽格式转换为长格式时,就会出现问题。我已经成功地进行了单独转换,但是效率很低而且很长,所以我想知道是否有人可以建议一个更简单的解决方案。下面是我创建的示例数据集,其格式与我使用的数据集的格式类似:

Apologies if this is a simple question, but I haven't been able to find a simple solution after searching. I'm fairly new to R, and am having trouble converting wide format to long format using either the melt (reshape2) or gather(tidyr) functions. The dataset that I'm working with contains 22 different time variables that are each 3 time periods. The problem occurs when I try to convert all of these from wide to long format at once. I have had success in converting them individually, but it's a very inefficient and long, so I was wondering if anyone could suggest a simpler solution. Below is a sample dataset I created that is formatted in a similar way as the dataset I am working with:

Subject <- c(1, 2, 3)
BlueTime1 <- c(2, 5, 6)
BlueTime2 <- c(4, 6, 7)
BlueTime3 <- c(1, 2, 3)
RedTime1 <- c(2, 5, 6)
RedTime2 <- c(4, 6, 7)
RedTime3 <- c(1, 2, 3)
GreenTime1 <- c(2, 5, 6)
GreenTime2 <- c(4, 6, 7)
GreenTime3 <- c(1, 2, 3)

sample.df <- data.frame(Subject, BlueTime1, BlueTime2, BlueTime3,
                    RedTime1, RedTime2, RedTime3,
                    GreenTime1,GreenTime2, GreenTime3)

对我有用的解决方案是使用tidyr的collect函数,按Subject排列数据(因此(将每个主题的数据分组在一起),然后仅选择主题,时间段和评分。这是针对每个变量(在我的情况下为22)完成的。

A solution that has worked for me is to use the gather function from tidyr, arranging the data by Subject (so that each subject's data is grouped together), and then selecting only the subject, time period, and rating. This was done for each variable (in my case 22).

install.packages("dplyr")
install.packages("tidyr")
library(dplyr)
library(tidyr)

BlueGather <- gather(sample.df, Time_Blue, Rating_Blue, c(BlueTime1,
                                                          BlueTime2,
                                                          BlueTime3))
BlueSorted <- arrange(BlueGather, Subject)

BlueSubtracted <- select(BlueSorted, Subject, Time_Blue, Rating_Blue)

在此代码之后,我将所有内容组合到一个数据帧中。对我来说,这似乎非常缓慢且效率低下,希望有人可以帮助我找到更简单的解决方案。谢谢!

After this code, I combine everything into one data frame. This seems very slow and inefficient to me, and was hoping that someone could help me find a simpler solution. Thank you!

推荐答案

我们可以使用数据中的融化 .table 可以采用多个 measure 列作为正则表达式 pattern

We can use melt from data.table which can take multiple measure columns as a regex pattern

library(data.table)
melt(setDT(sample.df), measure = patterns("^Blue", "^Red", "^Green"), 
     value.name = c("BlueTime", "RedTime", "GreenTime"), variable.name = "time")
#   Subject time BlueTime RedTime GreenTime
#1:       1    1        2       2         2
#2:       2    1        5       5         5
#3:       3    1        6       6         6
#4:       1    2        4       4         4
#5:       2    2        6       6         6
#6:       3    2        7       7         7
#7:       1    3        1       1         1
#8:       2    3        2       2         2
#9:       3    3        3       3         3






或@ StevenBeaupré在评论中提到,如果有ny模式,一种选择是提取子字符串作为 patterns 参数后,使用数据集的名称


Or as @StevenBeaupré mentioned in the comments, if there are many patterns, one option would be to use the names of the dataset after extracting the substring as the patterns argument

melt(setDT(sample.df), measure = patterns(as.list(unique(sub("\\d+", "", 
         names(sample.df)[-1])))),value.name = c("BlueTime", "RedTime", 
          "GreenTime"), variable.name = "time") 

这篇关于R:将宽格式转换为具有多个3个时间段变量的长格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆