将多组测量列(宽格式)整形为单列(长格式) [英] Reshaping multiple sets of measurement columns (wide format) into single columns (long format)

查看:84
本文介绍了将多组测量列(宽格式)整形为单列(长格式)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个宽格式的数据框,在不同的日期范围内进行了重复测量.在我的示例中,存在三个不同的时期,所有时期都有其对应的值.例如.第一次测量(Value1)是在DateRange1StartDateRange1End的时间段内进行的:

I have a dataframe in a wide format, with repeated measurements taken within different date ranges. In my example there are three different periods, all with their corresponding values. E.g. the first measurement (Value1) was measured in the period from DateRange1Start to DateRange1End:

ID DateRange1Start DateRange1End Value1 DateRange2Start DateRange2End Value2 DateRange3Start DateRange3End Value3
1 1/1/90 3/1/90 4.4 4/5/91 6/7/91 6.2 5/5/95 6/6/96 3.3 

我希望将数据重整形为长格式,以便对DateRangeXStart和DateRangeXEnd列进行分组.因此,原始表中的1行变成了新表中的3行:

I'm looking to reshape the data to a long format such that the DateRangeXStart and DateRangeXEnd columns are grouped,. Thus, what was 1 row in the original table becomes 3 rows in the new table:

ID DateRangeStart DateRangeEnd Value
1 1/1/90 3/1/90 4.4
1 4/5/91 6/7/91 6.2
1 5/5/95 6/6/96 3.3

我知道必须使用reshape2/melt/recast/tidyr进行此操作,但是我似乎无法弄清楚如何将多组测量变量映射到以这种特定方式设置一组值列.

I know there must be a way to do this with reshape2/melt/recast/tidyr, but I can't seem to figure it out how to map the multiple sets of measure variables into single sets of value columns in this particular way.

推荐答案

自版本以来,使用 tidyr 软件包的功能pivot_longer(),可以将具有多个值/度量值列的宽格式重整为长格式. 1.0.0 .

Reshaping from wide to long format with multiple value/measure columns is possible with the function pivot_longer() of the tidyr package since version 1.0.0.

这比以前的gather() tiidyr策略要好于spread()(请参阅@AndrewMacDonald的答案),因为不再删除属性(在下面的示例中,日期保留为日期,数字保留为数字).

This is superior to the previous tidyr strategy of gather() than spread() (see answer by @AndrewMacDonald), because the attributes are no longer dropped (dates remain dates and numerics remain numerics in the example below).

library("tidyr")
library("magrittr")

a <- structure(list(ID = 1L, 
                    DateRange1Start = structure(7305, class = "Date"), 
                    DateRange1End = structure(7307, class = "Date"), 
                    Value1 = 4.4, 
                    DateRange2Start = structure(7793, class = "Date"),
                    DateRange2End = structure(7856, class = "Date"), 
                    Value2 = 6.2, 
                    DateRange3Start = structure(9255, class = "Date"), 
                    DateRange3End = structure(9653, class = "Date"), 
                    Value3 = 3.3),
               row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))

pivot_longer()(对等项:pivot_wider())的工作方式与gather()类似. 但是,它提供了其他功能,例如多个值列. 仅使用一个值列,该宽数据集的所有同名将进入一个长列,其名称在names_to中给出. 对于多个值列,names_to可能会收到多个新名称.

pivot_longer() (counterpart: pivot_wider()) works similar to gather(). However, it offers additional functionality such as multiple value columns. With only one value column, all colnames of the wide data set would go into one long column with the name given in names_to. For multiple value columns, names_to may receive multiple new names.

如果所有列名都遵循特定的模式(例如Start_1End_1Start_2等),这是最简单的. 因此,我在第一步中对列进行了重命名.

This is easiest if all column names follow a specific pattern like Start_1, End_1, Start_2, etc. Therefore, I renamed the columns in the first step.

(names(a) <- sub("(\\d)(\\w*)", "\\2_\\1", names(a)))
#>  [1] "ID"               "DateRangeStart_1" "DateRangeEnd_1"  
#>  [4] "Value_1"          "DateRangeStart_2" "DateRangeEnd_2"  
#>  [7] "Value_2"          "DateRangeStart_3" "DateRangeEnd_3"  
#> [10] "Value_3"

pivot_longer(a, 
             cols = -ID, 
             names_to = c(".value", "group"),
             # names_prefix = "DateRange",
             names_sep = "_")
#> # A tibble: 3 x 5
#>      ID group DateRangeEnd DateRangeStart Value
#>   <int> <chr> <date>       <date>         <dbl>
#> 1     1 1     1990-01-03   1990-01-01       4.4
#> 2     1 2     1991-07-06   1991-05-04       6.2
#> 3     1 3     1996-06-06   1995-05-05       3.3

或者,可以使用提供更精细控制的数据透视规范完成重塑(请参见下面的链接):

Alternatively, the reshape may be done using a pivot spec that offers finer control (see link below):

spec <- a %>%
    build_longer_spec(cols = -ID) %>%
    dplyr::transmute(.name = .name,
                     group = readr::parse_number(name),
                     .value = stringr::str_extract(name, "Start|End|Value"))

pivot_longer(a, spec = spec)

reprex软件包(v0.2.1)

Created on 2019-03-26 by the reprex package (v0.2.1)

另请参见: https://tidyr.tidyverse.org/articles/pivot.html

这篇关于将多组测量列(宽格式)整形为单列(长格式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆