将多个列组合成整齐的数据 [英] Combine Multiple Columns Into Tidy Data

查看:123
本文介绍了将多个列组合成整齐的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对不起,我相信有办法做到这一点。但是,我无法解释解决方案以适应我的问题。



我的数据集如下所示:

  unique.id abx.1 start.1 stop.1 abx.2 start.2 stop.2 abx.3 start.3 stop.3 abx.4 start.4 
1 1 Moxi 2014-01-01 2014-01-07 PenG 2014-01-01 2014-01-07 Vanco 2014-01-01 2014-01-07 Moxi 2014-01-01
2 2 Moxi 2014- 01-01 2014-01-02 Cipro 2014-01-01 2014-01-02 PenG 2014-01-01 2014-01-02 Vanco 2014-01-01
3 3 Cipro 2014-01-01 2014- 01-05 Vanco 2014-01-01 2014-01-05 Cipro 2014-01-01 2014-01-05 Vanco 2014-01-01
4 4 Vanco 2014-01-02 2014-01-03 Cipro 2014 -01-02 2014-01-03 Cipro 2014-01-02 2014-01-03 PenG 2014-01-02
5 5 Vanco 2014-01-01 2014-01-02 PenG 2014-01-01 2014 -01-02 PenG 2014-01-01 2014-01-02 Cipro 2014-01-01
stop.4干预
1 2014-01-07 0
2 2014-01-02 0
3 2014-01-05 1
4 2014-01-03 1
5 2014- 01-02 0

使用一些代码创建:

  abxoptions<  -  c(Cipro,Moxi,PenG,Vanco)
df3< - data.frame $ b unique.id = 1:5,
abx.1 = sample(abxoptions,5,replace = TRUE),
start.1 = as.Date(c('2014-01-01' ,'2014-01-01','2014-01-01','2014-01-02','2014-01-01')),
stop.1 = as.Date(c(' 2014-01-07','2014-01-02','2014-01-05','2014-01-03','2014-01-02')),
abx.2 =样本(abxoptions,5,replace = TRUE),
start.2 = as.Date(c('2014-01-01','2014-01-01','2014-01-01','2014 -01-02','2014-01-01')),
stop.2 = as.Date(c('2014-01-07','2014-01-02','2014-01 -05','2014-01-03','2014-01-02')),
abx.3 = sample(abxoptions,5,replace = TRUE),
start.3 = as 。日期(c('2014-01-01','2014-01-01','2014-01-01','2014-01-02' 2014-01-01')),
stop.3 = as.Date(c('2014-01-07','2014-01-02','2014-01-05','2014- 01-03','2014-01-02')),
abx.4 = sample(abxoptions,5,replace = TRUE),
start.4 = as.Date(c('2014 -01-01','2014-01-01','2014-01-01','2014-01-02','2014-01-01')),
stop.4 = as。日期(c('2014-01-07','2014-01-02','2014-01-05','2014-01-03','2014-01-02')),
干预= c(0,0,1,1,0)



我想整理这些数据,如下所示:

  unique.id abx start停止干预
1 Moxi 2014-01-10 2014-01-07 0
1笔G 2014-01-01 2014-01-07 0
1 Vanco 2014-01-01 2014- 01-07 0
1 Moxi 2014-01-01 2014-01-07 0等等



<我已经花了最后两个下午试图解决这个问题。查看以下解决方案:
收集多组列
将多个列组合成一个



我希望我能围绕这个问题。我觉得解决方案应该很简单,我根本无法想像出来。对不起,这里不是编码器。我怀疑哈德利的惊人的tidyr pakcage是要走的路,只是无法想象出来。任何帮助将不胜感激。

解决方案

几乎每个数据整理问题都可以通过三个步骤解决:


  1. 收集所有非变量列

  2. 将colname列分隔成多个变量

  3. 重新传播数据

(通常你只需要一个或两个,但我认为他们几乎总是在这个顺序)。



对于您的数据:


  1. 已经是变量的唯一列是 unique.id

  2. 您需要将当前列名分为变量和数字

  3. 然后你需要将变量变量重新放入列

这样看起来像:



$(


$ b df3%>%
收集(col,value,-unique) id,-intervention)%>%
separate(col,c(variable,number))%>%
spread(variable,value,convert = TRUE)%>%
mutate(start = as.Date(start,1970-01-01),stop = as.Date(stop,1970-01-01))

您的案例有点复杂,因为您有两种类型的变量,因此您需要在最后恢复类型。


Sorry, I am sure there's a way to do this. However, I'm having trouble interpreting the solutions to fit my problem.

My dataset looks like this:

unique.id abx.1    start.1     stop.1 abx.2    start.2     stop.2 abx.3    start.3     stop.3 abx.4    start.4
1         1  Moxi 2014-01-01 2014-01-07  PenG 2014-01-01 2014-01-07 Vanco 2014-01-01 2014-01-07  Moxi 2014-01-01
2         2  Moxi 2014-01-01 2014-01-02 Cipro 2014-01-01 2014-01-02  PenG 2014-01-01 2014-01-02 Vanco 2014-01-01
3         3 Cipro 2014-01-01 2014-01-05 Vanco 2014-01-01 2014-01-05 Cipro 2014-01-01 2014-01-05 Vanco 2014-01-01
4         4 Vanco 2014-01-02 2014-01-03 Cipro 2014-01-02 2014-01-03 Cipro 2014-01-02 2014-01-03  PenG 2014-01-02
5         5 Vanco 2014-01-01 2014-01-02  PenG 2014-01-01 2014-01-02  PenG 2014-01-01 2014-01-02 Cipro 2014-01-01
      stop.4    intervention
1 2014-01-07       0
2 2014-01-02       0
3 2014-01-05       1
4 2014-01-03       1
5 2014-01-02       0

With some code to create this:

 abxoptions <- c("Cipro", "Moxi", "PenG", "Vanco")
      df3 <- data.frame(
      unique.id = 1:5,
      abx.1 = sample(abxoptions,5, replace=TRUE),
      start.1 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
      stop.1  = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
      abx.2 = sample(abxoptions,5, replace=TRUE),         
      start.2 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
      stop.2  = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
      abx.3 = sample(abxoptions,5, replace=TRUE),         
      start.3 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
      stop.3  = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
      abx.4 = sample(abxoptions,5, replace=TRUE),         
      start.4 = as.Date(c('2014-01-01', '2014-01-01', '2014-01-01', '2014-01-02', '2014-01-01')),
      stop.4  = as.Date(c('2014-01-07', '2014-01-02', '2014-01-05', '2014-01-03', '2014-01-02')),
      intervention = c(0,0,1,1,0)

)

I would like to tidy this data to look like this:

unique.id    abx     start    stop           intervention
1            Moxi    2014-01-10 2014-01-07      0
1            Pen G   2014-01-01 2014-01-07      0
1            Vanco   2014-01-01 2014-01-07      0
1            Moxi    2014-01-01 2014-01-07      0  etc etc

I've spent the last two afternoons trying to solve this. Looked at the following solutions: Gather multiple sets of columns and Combining multiple columns into one

I wish I could wrap my head around this problem. I feel like the solution should be quite easy, I just can't figure it out. Sorry, not a coder here. I suspect that Hadley's amazing tidyr pakcage is the way to go...just can't figure this out. Any help would be greatly appreciated.

解决方案

Almost every data tidying problem can be solved in three steps:

  1. Gather all non-variable columns
  2. Separate "colname" column into multiple variables
  3. Re-spread the data

(often you'll only need one or two of these, but I think they're almost always in this order).

For your data:

  1. The only column that's already a variable is unique.id
  2. You need to split current column names into variable and number
  3. Then you need to put the "variable" variable back into columns

This looks like:

library(tidyr)
library(dplyr)

df3 %>%
  gather(col, value, -unique.id, -intervention) %>%
  separate(col, c("variable", "number")) %>%
  spread(variable, value, convert = TRUE) %>%
  mutate(start = as.Date(start, "1970-01-01"), stop = as.Date(stop, "1970-01-01"))

Your case is a bit more complicated because you have two types of variables, so you need to restore the types at the end.

这篇关于将多个列组合成整齐的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆