如何使用“四重嵌套”将宽数据帧转换为多级结构的长数据帧? [英] How do I convert a wide dataframe to a long dataframe for a multilevel structure with 'quadruple nesting'?

查看:196
本文介绍了如何使用“四重嵌套”将宽数据帧转换为多级结构的长数据帧?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我进行了一项研究,回想起来(一个人,一个学习:-))似乎生成多层数据。现在我试图将数据集从广泛到重组,以便我可以使用例如lme4。



在这样做的时候,我遇到了一个呃的挑战,我曾经遇到过几次,但是我从未找到一个好的解。这次我再次搜索,但是我可能使用错误的关键字 - 或者这个问题比我想象的要少得多。



基本上,在这个数据集中,可变名表示收集测量数据。我要求参与者进行评分(比率)干预(可能是真的)。每个干预都是6个行为领域之一。此外,参与者对每个干预进行评估,既可以自行提交,也可以与另一个干预措施进行评估,或者与另外两个干预措施进行评估。有三种类型的干预措施,它们都在(t0)之前和之后(t1)之间进行了评估。我给他们提供了一些信息。



所以,实际上,一个可以重新生成的数据框:

  ###变量名元素
measurementMomentsVector < (t0,t1);
interventionTypesVector< - c(fear,know,scd);
nrOfInterventionsSimultaneouslyVector< - c(1,2,3);
behaviorDomainsVector< - c(饮食,痘,alc,吸烟,traff,adh);

###生成带有所有变量名的向量
variableNames < -
apply(expand.grid(measurementMomentsVector,
interventionTypesVector,
nrOfInterventionsSimultaneouslyVector,
behaviorDomainsVector),
1,paste0,collapse =_);

###生成5'参与者的数据
wideData< - data.frame(matrix(rnorm(5 * length(variableNames)),nrow = 5));

###指定名称
名称(wideData)< - variableNames;

###为每个参与者添加唯一的id变量
wideData $ id < - 1:5;

所以使用 head(wideData)[,1:5] 你可以看到数据框大致如下:

  t0_fear_1_diet t1_fear_1_diet t0_know_1_diet t1_know_1_diet t0_scd_1_diet 
1 -0.9338191 0.9747453 1.0069036 0.3500103 -0.844699708
2 0.8921867 1.3687834 -1.2005791 0.2747955 1.316768219
3 1.6200200 0.5245470 -1.2910586 1.3211912 -0.174795144
4 0.1543738 0.7535642 0.4726131 -0.3464789 -0.009190702
5 -1.3676692 -0.4491574 - 2.0902003 -0.3484678 -2.537501824

现在,我想将这个数据转换成一个长数据帧,有6个变量,例如'id','measurementMoment','interventionType','nrOfInterventionsSimultaneously','behaviorDomain'和'evaluation',其中第一个变量表示记录所属的参与者,最后一个变量是s核心(评级,等级,评估)参与者进行了具体的干预,其中四个变量表示正在评估干预措施。



我可以写一些定制'代码只是为了这个问题,但我希望R'有一些这样的'。我一直在玩reshape2,例如:

  longData<  -  reshape(wideData,vary = 1:(ncol wideData)-1),
idvar =id,
sep =_,direction =long)

但是它不能猜测时变变量:

 错误在猜测(变化):
无法从他们的名字猜测时变变量

我现在已经挣扎了几次,我没有设法在网上找到任何答案。而现在我真的需要继续前进,所以我以为在尝试写一些定制的作品之前,我会尝试这样做: - )



我会很大欣赏任何人都可以提供的任何指针!!!

解决方案

我认为你的问题可以用两步法解决:


  1. 将您的数据融入一个长的 data.frame (或者像我这样做,将 data.table

  2. 将所有标签的变量为每个必需的分组变量分隔列。

由于这些信息在标签中,所以可以通过 tstrsplit code code code code code code code code code code code code $ c $ / p>

  library(data.table)
longData< - melt(setDT(wideData),id.vars =id )
longData [,c(moment,intervention,number,behavior):=
tstrsplit(variable,_,type.convert = TRUE)
] [,变量:= NULL]

结果:

 >头(longData,15)
id值时间干预数量行为
1:1 -0.07747254 t0恐惧1饮食
2:2 -0.76207379 t0恐惧1饮食
3:3 1.15501244 t0恐惧1饮食
4:4 1.24792369 t0恐惧1饮食
5:5 -0.28226121 t0恐惧1饮食
6:1 -1.04875354 t1恐惧1饮食
7:2 - 0.91436882 t1恐惧1饮食
8:3 0.72863487 t1恐惧1饮食
9:4 0.10934261 t1恐惧1饮食
10:5 -0.06093002 t1恐惧1饮食
11:1 - 0.70725760 t0知道1饮食
12:2 1.06309003 t0知道1饮食
13:3 0.89501164 t0知道1饮食
14:4 1.48148316 t0知道1饮食
15:5 0.22086835 t0知道1饮食






作为 data.table 的替代方案,您还可以拆分变量列与 cSplit 函数的 splitstackshape 包(您将必须然后重命名结果的变量列):

  library(splitstackshape)
longData< - cSplit(longData,sep =_,variable,wide,type.convert = TRUE)
名称(longData)< - c(id,value,moment,干预 ,行为)

或与 tidyr

  library(tidyr)
separate(longData,variable,c(moment,intervention number,behavior),sep =_,remove = TRUE)


I conducted a study that, in retrospect (one lives, one learns :-)) appears to generate multilevel data. Now I'm trying to restructure the dataset from wide to long so that I can analyse it using e.g. lme4.

In doing so, I encounter an, um, challenge, that I've ran into a few times before, but for which I've never found a good solution. I've searched again this time, but I probably use the wrong keywords - or this problem is much rarer than I thought.

Basically, in this dataset, the variablenames indicate for which measure data is collected. I asked participants to grade (rate) interventions (could be anything really). Each intervention is in one of 6 behavioral domains. In addition, participants rated each intervention either when it was presented on its own, or simultaneously with one other intervention, or with two other interventions. There were three types of interventions, and they were all rated before (t0) and after (t1) I presented them with some information.

So, in effect, I have a dataframe that can be regenerated like this:

### Elements of the variable names
measurementMomentsVector <- c("t0", "t1");
interventionTypesVector <- c("fear", "know", "scd");
nrOfInterventionsSimultaneouslyVector <- c(1, 2, 3);
behaviorDomainsVector <- c("diet", "pox", "alc", "smoking", "traff", "adh");

### Generate a vector with all variable names
variableNames <-
  apply(expand.grid(measurementMomentsVector,
                    interventionTypesVector,
                    nrOfInterventionsSimultaneouslyVector,
                    behaviorDomainsVector),
        1, paste0, collapse="_");

### Generate 5 'participants' worth of data
wideData <- data.frame(matrix(rnorm(5*length(variableNames)), nrow=5));

### Assign names
names(wideData) <- variableNames;

### Add unique id variable for every participants
wideData$id <- 1:5;

So using head(wideData)[, 1:5] you can see roughly what the dataframe looks like:

  t0_fear_1_diet t1_fear_1_diet t0_know_1_diet t1_know_1_diet t0_scd_1_diet
1     -0.9338191      0.9747453      1.0069036      0.3500103  -0.844699708
2      0.8921867      1.3687834     -1.2005791      0.2747955   1.316768219
3      1.6200200      0.5245470     -1.2910586      1.3211912  -0.174795144
4      0.1543738      0.7535642      0.4726131     -0.3464789  -0.009190702
5     -1.3676692     -0.4491574     -2.0902003     -0.3484678  -2.537501824

Now, I want to convert this data to a long dataframe, with 6 variables, for example 'id', 'measurementMoment', 'interventionType', 'nrOfInterventionsSimultaneously', 'behaviorDomain', and 'evaluation', where the first variable denotes the participants to which a record belongs, the last variable is the score (rating, grade, evaluation) the participants gave a specific intervention, and the four variables in between indicate which intervention is being rated exactly.

I can probably write some 'custom' code just for this problem, but I expect R 'has something for this'. I've been playing around with reshape2, e.g.:

longData <- reshape(wideData, varying=1:(ncol(wideData)-1),
                    idvar="id",
                    sep="_", direction="long")

But it doesn't manage to guess the time-varying variables:

Error in guess(varying) : 
  failed to guess time-varying variables from their names

I have been struggling with this a few times now, and I don't manage to find any answers online. And now I really need to move on, so I thought I'd try this as a last effort before resorting to writing something custom-made :-)

I would greatly appreciate any pointers anybody can give!!!

解决方案

I think your problem can be solved with a two step approach:

  1. melt your data into a long data.frame (or as I did, in a long data.table)
  2. split the variable column with all the labels into separate columns for each required grouping variable.

As the information for this is in the labels, this can easily be achieved with the tstrsplit function from the data.table package.

This is what you might be looking for:

library(data.table)
longData <- melt(setDT(wideData), id.vars="id")
longData[, c("moment", "intervention", "number", "behavior") := 
                tstrsplit(variable, "_", type.convert = TRUE)
       ][, variable:=NULL]

the result:

> head(longData,15)
    id       value moment intervention number behavior
 1:  1 -0.07747254     t0         fear      1     diet
 2:  2 -0.76207379     t0         fear      1     diet
 3:  3  1.15501244     t0         fear      1     diet
 4:  4  1.24792369     t0         fear      1     diet
 5:  5 -0.28226121     t0         fear      1     diet
 6:  1 -1.04875354     t1         fear      1     diet
 7:  2 -0.91436882     t1         fear      1     diet
 8:  3  0.72863487     t1         fear      1     diet
 9:  4  0.10934261     t1         fear      1     diet
10:  5 -0.06093002     t1         fear      1     diet
11:  1 -0.70725760     t0         know      1     diet
12:  2  1.06309003     t0         know      1     diet
13:  3  0.89501164     t0         know      1     diet
14:  4  1.48148316     t0         know      1     diet
15:  5  0.22086835     t0         know      1     diet


As an alternative to data.table, you can also split the variable column with the cSplit function of the splitstackshape package (you will have to rename the resulting variable columns afterwards though):

library(splitstackshape)
longData <- cSplit(longData, sep="_", "variable", "wide", type.convert=TRUE)
names(longData) <- c("id","value","moment","intervention","number","behavior")

or with tidyr:

library(tidyr)
separate(longData, variable, c("moment", "intervention", "number", "behavior"), sep="_", remove=TRUE)

这篇关于如何使用“四重嵌套”将宽数据帧转换为多级结构的长数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆