如何使用“四重嵌套”将宽数据帧转换为多级结构的长数据帧? [英] How do I convert a wide dataframe to a long dataframe for a multilevel structure with 'quadruple nesting'?
问题描述
在这样做的时候,我遇到了一个呃的挑战,我曾经遇到过几次,但是我从未找到一个好的解。这次我再次搜索,但是我可能使用错误的关键字 - 或者这个问题比我想象的要少得多。
基本上,在这个数据集中,可变名表示收集测量数据。我要求参与者进行评分(比率)干预(可能是真的)。每个干预都是6个行为领域之一。此外,参与者对每个干预进行评估,既可以自行提交,也可以与另一个干预措施进行评估,或者与另外两个干预措施进行评估。有三种类型的干预措施,它们都在(t0)之前和之后(t1)之间进行了评估。我给他们提供了一些信息。
所以,实际上,一个可以重新生成的数据框:
###变量名元素
measurementMomentsVector < (t0,t1);
interventionTypesVector< - c(fear,know,scd);
nrOfInterventionsSimultaneouslyVector< - c(1,2,3);
behaviorDomainsVector< - c(饮食,痘,alc,吸烟,traff,adh);
###生成带有所有变量名的向量
variableNames < -
apply(expand.grid(measurementMomentsVector,
interventionTypesVector,
nrOfInterventionsSimultaneouslyVector,
behaviorDomainsVector),
1,paste0,collapse =_);
###生成5'参与者的数据
wideData< - data.frame(matrix(rnorm(5 * length(variableNames)),nrow = 5));
###指定名称
名称(wideData)< - variableNames;
###为每个参与者添加唯一的id变量
wideData $ id < - 1:5;
所以使用 head(wideData)[,1:5]
你可以看到数据框大致如下:
t0_fear_1_diet t1_fear_1_diet t0_know_1_diet t1_know_1_diet t0_scd_1_diet
1 -0.9338191 0.9747453 1.0069036 0.3500103 -0.844699708
2 0.8921867 1.3687834 -1.2005791 0.2747955 1.316768219
3 1.6200200 0.5245470 -1.2910586 1.3211912 -0.174795144
4 0.1543738 0.7535642 0.4726131 -0.3464789 -0.009190702
5 -1.3676692 -0.4491574 - 2.0902003 -0.3484678 -2.537501824
现在,我想将这个数据转换成一个长数据帧,有6个变量,例如'id','measurementMoment','interventionType','nrOfInterventionsSimultaneously','behaviorDomain'和'evaluation',其中第一个变量表示记录所属的参与者,最后一个变量是s核心(评级,等级,评估)参与者进行了具体的干预,其中四个变量表示正在评估干预措施。
我可以写一些定制'代码只是为了这个问题,但我希望R'有一些这样的'。我一直在玩reshape2,例如:
longData< - reshape(wideData,vary = 1:(ncol wideData)-1),
idvar =id,
sep =_,direction =long)
但是它不能猜测时变变量:
错误在猜测(变化):
无法从他们的名字猜测时变变量
我现在已经挣扎了几次,我没有设法在网上找到任何答案。而现在我真的需要继续前进,所以我以为在尝试写一些定制的作品之前,我会尝试这样做: - )
我会很大欣赏任何人都可以提供的任何指针!!!
我认为你的问题可以用两步法解决:
- 将您的数据融入一个长的
data.frame
(或者像我这样做,将data.table
) - 将所有标签的
变量
为每个必需的分组变量分隔列。
由于这些信息在标签中,所以可以通过 tstrsplit
code code code code code code code code code code code code $ c $ / p>
library(data.table)
longData< - melt(setDT(wideData),id.vars =id )
longData [,c(moment,intervention,number,behavior):=
tstrsplit(variable,_,type.convert = TRUE)
] [,变量:= NULL]
结果:
>头(longData,15)
id值时间干预数量行为
1:1 -0.07747254 t0恐惧1饮食
2:2 -0.76207379 t0恐惧1饮食
3:3 1.15501244 t0恐惧1饮食
4:4 1.24792369 t0恐惧1饮食
5:5 -0.28226121 t0恐惧1饮食
6:1 -1.04875354 t1恐惧1饮食
7:2 - 0.91436882 t1恐惧1饮食
8:3 0.72863487 t1恐惧1饮食
9:4 0.10934261 t1恐惧1饮食
10:5 -0.06093002 t1恐惧1饮食
11:1 - 0.70725760 t0知道1饮食
12:2 1.06309003 t0知道1饮食
13:3 0.89501164 t0知道1饮食
14:4 1.48148316 t0知道1饮食
15:5 0.22086835 t0知道1饮食
作为 data.table
的替代方案,您还可以拆分变量
列与 cSplit
函数的 splitstackshape
包(您将必须然后重命名结果的变量列):
library(splitstackshape)
longData< - cSplit(longData,sep =_,variable,wide,type.convert = TRUE)
名称(longData)< - c(id,value,moment,干预 ,行为)
或与 tidyr
:
library(tidyr)
separate(longData,variable,c(moment,intervention number,behavior),sep =_,remove = TRUE)
I conducted a study that, in retrospect (one lives, one learns :-)) appears to generate multilevel data. Now I'm trying to restructure the dataset from wide to long so that I can analyse it using e.g. lme4.
In doing so, I encounter an, um, challenge, that I've ran into a few times before, but for which I've never found a good solution. I've searched again this time, but I probably use the wrong keywords - or this problem is much rarer than I thought.
Basically, in this dataset, the variablenames indicate for which measure data is collected. I asked participants to grade (rate) interventions (could be anything really). Each intervention is in one of 6 behavioral domains. In addition, participants rated each intervention either when it was presented on its own, or simultaneously with one other intervention, or with two other interventions. There were three types of interventions, and they were all rated before (t0) and after (t1) I presented them with some information.
So, in effect, I have a dataframe that can be regenerated like this:
### Elements of the variable names
measurementMomentsVector <- c("t0", "t1");
interventionTypesVector <- c("fear", "know", "scd");
nrOfInterventionsSimultaneouslyVector <- c(1, 2, 3);
behaviorDomainsVector <- c("diet", "pox", "alc", "smoking", "traff", "adh");
### Generate a vector with all variable names
variableNames <-
apply(expand.grid(measurementMomentsVector,
interventionTypesVector,
nrOfInterventionsSimultaneouslyVector,
behaviorDomainsVector),
1, paste0, collapse="_");
### Generate 5 'participants' worth of data
wideData <- data.frame(matrix(rnorm(5*length(variableNames)), nrow=5));
### Assign names
names(wideData) <- variableNames;
### Add unique id variable for every participants
wideData$id <- 1:5;
So using head(wideData)[, 1:5]
you can see roughly what the dataframe looks like:
t0_fear_1_diet t1_fear_1_diet t0_know_1_diet t1_know_1_diet t0_scd_1_diet
1 -0.9338191 0.9747453 1.0069036 0.3500103 -0.844699708
2 0.8921867 1.3687834 -1.2005791 0.2747955 1.316768219
3 1.6200200 0.5245470 -1.2910586 1.3211912 -0.174795144
4 0.1543738 0.7535642 0.4726131 -0.3464789 -0.009190702
5 -1.3676692 -0.4491574 -2.0902003 -0.3484678 -2.537501824
Now, I want to convert this data to a long dataframe, with 6 variables, for example 'id', 'measurementMoment', 'interventionType', 'nrOfInterventionsSimultaneously', 'behaviorDomain', and 'evaluation', where the first variable denotes the participants to which a record belongs, the last variable is the score (rating, grade, evaluation) the participants gave a specific intervention, and the four variables in between indicate which intervention is being rated exactly.
I can probably write some 'custom' code just for this problem, but I expect R 'has something for this'. I've been playing around with reshape2, e.g.:
longData <- reshape(wideData, varying=1:(ncol(wideData)-1),
idvar="id",
sep="_", direction="long")
But it doesn't manage to guess the time-varying variables:
Error in guess(varying) :
failed to guess time-varying variables from their names
I have been struggling with this a few times now, and I don't manage to find any answers online. And now I really need to move on, so I thought I'd try this as a last effort before resorting to writing something custom-made :-)
I would greatly appreciate any pointers anybody can give!!!
I think your problem can be solved with a two step approach:
- melt your data into a long
data.frame
(or as I did, in a longdata.table
) - split the
variable
column with all the labels into separate columns for each required grouping variable.
As the information for this is in the labels, this can easily be achieved with the tstrsplit
function from the data.table
package.
This is what you might be looking for:
library(data.table)
longData <- melt(setDT(wideData), id.vars="id")
longData[, c("moment", "intervention", "number", "behavior") :=
tstrsplit(variable, "_", type.convert = TRUE)
][, variable:=NULL]
the result:
> head(longData,15)
id value moment intervention number behavior
1: 1 -0.07747254 t0 fear 1 diet
2: 2 -0.76207379 t0 fear 1 diet
3: 3 1.15501244 t0 fear 1 diet
4: 4 1.24792369 t0 fear 1 diet
5: 5 -0.28226121 t0 fear 1 diet
6: 1 -1.04875354 t1 fear 1 diet
7: 2 -0.91436882 t1 fear 1 diet
8: 3 0.72863487 t1 fear 1 diet
9: 4 0.10934261 t1 fear 1 diet
10: 5 -0.06093002 t1 fear 1 diet
11: 1 -0.70725760 t0 know 1 diet
12: 2 1.06309003 t0 know 1 diet
13: 3 0.89501164 t0 know 1 diet
14: 4 1.48148316 t0 know 1 diet
15: 5 0.22086835 t0 know 1 diet
As an alternative to data.table
, you can also split the variable
column with the cSplit
function of the splitstackshape
package (you will have to rename the resulting variable columns afterwards though):
library(splitstackshape)
longData <- cSplit(longData, sep="_", "variable", "wide", type.convert=TRUE)
names(longData) <- c("id","value","moment","intervention","number","behavior")
or with tidyr
:
library(tidyr)
separate(longData, variable, c("moment", "intervention", "number", "behavior"), sep="_", remove=TRUE)
这篇关于如何使用“四重嵌套”将宽数据帧转换为多级结构的长数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!