重塑凌乱的纵向调查数据,包含多个不同的变量,从宽到长 [英] Reshape messy longitudinal survey data containing multiple different variables, wide to long

查看:35
本文介绍了重塑凌乱的纵向调查数据,包含多个不同的变量,从宽到长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望我不是在重新创建轮子,也不要认为使用 reshape 可以回答以下问题.

I hope that I'm not recreating the wheel, and do not think that the following can be answered using reshape.

我有凌乱的纵向调查数据,我想将其从宽格式转换为长格式.凌乱我的意思是:

I have messy longitudinal survey data, that I want to convert from wide to long format. By messy I mean:

  • 我有多种变量类型(数字、因子、逻辑)
  • 并非在每个时间点都收集了所有变量.

例如:

data <- read.table(header=T, text='
  id inlove.1 inlove.2 income.2 income.3 mood.1 mood.3 random
  1      TRUE    FALSE 87717.76 82281.25  happy  happy filler
  2      TRUE     TRUE 70795.53 54995.19  so-so  happy filler
  3     FALSE    FALSE 48012.77 47650.47    sad  so-so filler
 ')

我无法弄清楚如何使用 reshape 来重塑数据,并不断收到错误消息 'times' is wrong length.我认为这是因为并非每个变量都在每种情况下都被记录下来.此外,我不认为 reshape2 中的 meltcast 会起作用,因为它要求所有测量的变量都属于同一类型.

I could not work out how to reshape the data using reshape, and keep getting the error message 'times' is wrong length. Which I assume is because not every variable is recorded on every occasion. Also I don't think melt and cast from reshape2 will work as it requires all measured variables to be of the same type.

我想出了以下可能对其他人有所帮助的解决方案.它按时间点选择变量,重命名它们,然后使用 plyr 中的 rbind.fill 将它们连接在一起.但我想知道 reshape 是否遗漏了一些东西,或者是否可以使用 tidyr 或其他包更容易地做到这一点?

I came up with the following solution which may help others. It selects variables by timepoint, renames them, and then uses rbind.fill from plyr to concatenate them together. But I wonder if I'm missing something with reshape or if this can be done easier using tidyr or another package?

reshapeLong2 <- function(data, varying = NULL, timevar = "time", idvar = "id", sep = ".", patterns = NULL) {

  require(plyr)
  substrRight <- function(x, n){
    substr(x, nchar(x)-n+1, nchar(x))
  }

  if (is.null(varying))
    varying <- names(data)[! names(data) %in% idvar]

  # Create pattern if not specified, guesses by taking numbers given at end of variable names.
  if (is.null(patterns)) {
    times <- unique(na.omit(as.numeric(substrRight(varying, 1))))
    times <- times[order = times]
    patterns <- paste0(sep, times)    
  }

  # Create list of datasets by study time
  ls.df <- lapply(patterns, function(pattern) {
    var.old <- grep(pattern, x = varying, value = TRUE)
    var.new <- gsub(pattern, "", x = var.old)
    df <- data[, c(idvar, var.old)]
    names(df) <- c(idvar, var.new)
    df[, timevar] <- match(pattern, patterns)
    return(df)
  })

  # Concatenate datasets together
  dfs <- rbind.fill(ls.df)
  return(dfs)
}

> reshapeLong2(df.test)
  id inlove  mood time   income
1  1  FALSE   sad    1       NA
2  2   TRUE so-so    1       NA
3  3   TRUE   sad    1       NA
4  1   TRUE  <NA>    2 27766.13
5  2  FALSE  <NA>    2 74395.30
6  3   TRUE  <NA>    2 89004.95
7  1     NA   sad    3 27270.07
8  2     NA so-so    3 36971.64
9  3     NA so-so    3 85986.96
Warning message:
In na.omit(as.numeric(substrRight(varying, 1))) :
  NAs introduced by coercion

注意,警告消息表明有一些变量被丢弃(在这种情况下是随机").如果所有变量都列为 idvar 或变量,则不会显示警告.

Note, warning message indicates that there are some variables that are dropped (in this case "random"). Warning not shown if all variables are listed as either idvar or varying.

推荐答案

如果您将 varname.TIME 列中的所有缺失次数都填写为 NA,则可以只是 reshape 就像:

If you fill in varname.TIME columns as NA for all the missing times, you can then just reshape like:

uniqnames <- c("inlove","income","mood")
allnames  <- make.unique(rep(uniqnames,4))[-(seq_along(uniqnames))]
#[1] "inlove.1" "income.1" "mood.1"   "inlove.2" "income.2" "mood.2" ...
data[setdiff(allnames, names(data)[-1])] <- NA
#  id inlove.1 inlove.2 income.2 income.3 mood.1 mood.3 random income.1 mood.2 inlove.3
#1  1     TRUE    FALSE 87717.76 82281.25  happy  happy filler       NA     NA       NA
#2  2     TRUE     TRUE 70795.53 54995.19  so-so  happy filler       NA     NA       NA
#3  3    FALSE    FALSE 48012.77 47650.47    sad  so-so filler       NA     NA       NA

reshape(data, idvar="id", direction="long", sep=".", varying=allnames)

#    id random time inlove   income  mood
#1.1  1 filler    1   TRUE       NA happy
#2.1  2 filler    1   TRUE       NA so-so
#3.1  3 filler    1  FALSE       NA   sad
#1.2  1 filler    2  FALSE 87717.76  <NA>
#2.2  2 filler    2   TRUE 70795.53  <NA>
#3.2  3 filler    2  FALSE 48012.77  <NA>
#1.3  1 filler    3     NA 82281.25 happy
#2.3  2 filler    3     NA 54995.19 happy
#3.3  3 filler    3     NA 47650.47 so-so

这篇关于重塑凌乱的纵向调查数据,包含多个不同的变量,从宽到长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆