R:将宽格式转换为具有多个3个时间段变量的长格式 [英] R: Converting wide format to long format with multiple 3 time period variables
问题描述
很抱歉,如果这是一个简单的问题,但是搜索后我一直找不到简单的解决方案。我刚接触R,遇到麻烦,无法使用melt(reshape2)或collect(tidyr)函数将宽格式转换为长格式。我正在使用的数据集包含22个不同的时间变量,每个时间段均为3个时间段。当我尝试一次将所有这些从宽格式转换为长格式时,就会出现问题。我已经成功地进行了单独转换,但是效率很低而且很长,所以我想知道是否有人可以建议一个更简单的解决方案。下面是我创建的示例数据集,其格式与我使用的数据集的格式类似:
Apologies if this is a simple question, but I haven't been able to find a simple solution after searching. I'm fairly new to R, and am having trouble converting wide format to long format using either the melt (reshape2) or gather(tidyr) functions. The dataset that I'm working with contains 22 different time variables that are each 3 time periods. The problem occurs when I try to convert all of these from wide to long format at once. I have had success in converting them individually, but it's a very inefficient and long, so I was wondering if anyone could suggest a simpler solution. Below is a sample dataset I created that is formatted in a similar way as the dataset I am working with:
Subject <- c(1, 2, 3)
BlueTime1 <- c(2, 5, 6)
BlueTime2 <- c(4, 6, 7)
BlueTime3 <- c(1, 2, 3)
RedTime1 <- c(2, 5, 6)
RedTime2 <- c(4, 6, 7)
RedTime3 <- c(1, 2, 3)
GreenTime1 <- c(2, 5, 6)
GreenTime2 <- c(4, 6, 7)
GreenTime3 <- c(1, 2, 3)
sample.df <- data.frame(Subject, BlueTime1, BlueTime2, BlueTime3,
RedTime1, RedTime2, RedTime3,
GreenTime1,GreenTime2, GreenTime3)
对我有用的解决方案是使用tidyr的collect函数,按Subject排列数据(因此(将每个主题的数据分组在一起),然后仅选择主题,时间段和评分。这是针对每个变量(在我的情况下为22)完成的。
A solution that has worked for me is to use the gather function from tidyr, arranging the data by Subject (so that each subject's data is grouped together), and then selecting only the subject, time period, and rating. This was done for each variable (in my case 22).
install.packages("dplyr")
install.packages("tidyr")
library(dplyr)
library(tidyr)
BlueGather <- gather(sample.df, Time_Blue, Rating_Blue, c(BlueTime1,
BlueTime2,
BlueTime3))
BlueSorted <- arrange(BlueGather, Subject)
BlueSubtracted <- select(BlueSorted, Subject, Time_Blue, Rating_Blue)
在此代码之后,我将所有内容组合到一个数据帧中。对我来说,这似乎非常缓慢且效率低下,希望有人可以帮助我找到更简单的解决方案。谢谢!
After this code, I combine everything into one data frame. This seems very slow and inefficient to me, and was hoping that someone could help me find a simpler solution. Thank you!
推荐答案
我们可以使用数据中的
可以采用多个融化
.table measure
列作为正则表达式 pattern
We can use melt
from data.table
which can take multiple measure
columns as a regex pattern
library(data.table)
melt(setDT(sample.df), measure = patterns("^Blue", "^Red", "^Green"),
value.name = c("BlueTime", "RedTime", "GreenTime"), variable.name = "time")
# Subject time BlueTime RedTime GreenTime
#1: 1 1 2 2 2
#2: 2 1 5 5 5
#3: 3 1 6 6 6
#4: 1 2 4 4 4
#5: 2 2 6 6 6
#6: 3 2 7 7 7
#7: 1 3 1 1 1
#8: 2 3 2 2 2
#9: 3 3 3 3 3
或@ StevenBeaupré在评论中提到,如果有ny模式,一种选择是提取子字符串作为 patterns
参数后,使用数据集的名称
Or as @StevenBeaupré mentioned in the comments, if there are many patterns, one option would be to use the names
of the dataset after extracting the substring as the patterns
argument
melt(setDT(sample.df), measure = patterns(as.list(unique(sub("\\d+", "",
names(sample.df)[-1])))),value.name = c("BlueTime", "RedTime",
"GreenTime"), variable.name = "time")
这篇关于R:将宽格式转换为具有多个3个时间段变量的长格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!