将多列组合成整洁的数据 [英] Combine Multiple Columns Into Tidy Data

查看:38
本文介绍了将多列组合成整洁的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据集如下所示:

unique.id abx.1    start.1     stop.1 abx.2    start.2     stop.2 abx.3    start.3     stop.3 abx.4    start.4
1         1  Moxi 2014-01-01 2014-01-07  PenG 2014-01-01 2014-01-07 Vanco 2014-01-01 2014-01-07  Moxi 2014-01-01
2         2  Moxi 2014-01-01 2014-01-02 Cipro 2014-01-01 2014-01-02  PenG 2014-01-01 2014-01-02 Vanco 2014-01-01
3         3 Cipro 2014-01-01 2014-01-05 Vanco 2014-01-01 2014-01-05 Cipro 2014-01-01 2014-01-05 Vanco 2014-01-01
4         4 Vanco 2014-01-02 2014-01-03 Cipro 2014-01-02 2014-01-03 Cipro 2014-01-02 2014-01-03  PenG 2014-01-02
5         5 Vanco 2014-01-01 2014-01-02  PenG 2014-01-01 2014-01-02  PenG 2014-01-01 2014-01-02 Cipro 2014-01-01
      stop.4    intervention
1 2014-01-07       0
2 2014-01-02       0
3 2014-01-05       1
4 2014-01-03       1
5 2014-01-02       0

用一些代码来创建这个:

With some code to create this:

 abxoptions <- c("Cipro", "Moxi", "PenG", "Vanco")
      df3 <- data.frame(
      unique.id = 1:5,
      abx.1 = sample(abxoptions,5, replace=TRUE),
      start.1 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
      stop.1  = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
      abx.2 = sample(abxoptions,5, replace=TRUE),         
      start.2 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
      stop.2  = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
      abx.3 = sample(abxoptions,5, replace=TRUE),         
      start.3 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
      stop.3  = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
      abx.4 = sample(abxoptions,5, replace=TRUE),         
      start.4 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
      stop.4  = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
      intervention = c(0,0,1,1,0)

)

我想把这些数据整理成这样:

I would like to tidy this data to look like this:

unique.id    abx     start    stop           intervention
1            Moxi    2014-01-10 2014-01-07      0
1            Pen G   2014-01-01 2014-01-07      0
1            Vanco   2014-01-01 2014-01-07      0
1            Moxi    2014-01-01 2014-01-07      0  etc etc

以下解决方案没有让我到达我需要的地方:收集多组列将多列合二为一

The following solutions didn't get me where I needed: Gather multiple sets of columns and Combining multiple columns into one

我怀疑 Hadley 令人惊叹的 tidyr pakcage 是要走的路……只是想不通.任何帮助将不胜感激.

I suspect that Hadley's amazing tidyr pakcage is the way to go...just can't figure this out. Any help would be greatly appreciated.

推荐答案

几乎所有的数据整理问题都可以通过三步解决:

Almost every data tidying problem can be solved in three steps:

  1. 收集所有非变量列
  2. 将colname"列分成多个变量
  3. 重新传播数据

(通常您只需要其中的一两个,但我认为它们几乎总是按此顺序排列).

(often you'll only need one or two of these, but I think they're almost always in this order).

对于您的数据:

  1. 唯一已经是变量的列是 unique.id
  2. 您需要将当前列名拆分为变量和数字
  3. 然后您需要将变量"变量放回列中

这看起来像:

library(tidyr)
library(dplyr)

df3 %>%
  gather(col, value, -unique.id, -intervention) %>%
  separate(col, c("variable", "number")) %>%
  spread(variable, value, convert = TRUE) %>%
  mutate(start = as.Date(start, "1970-01-01"), stop = as.Date(stop, "1970-01-01"))

你的情况有点复杂,因为你有两种类型的变量,所以最后需要恢复类型.

Your case is a bit more complicated because you have two types of variables, so you need to restore the types at the end.

这篇关于将多列组合成整洁的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆